Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.0, 3.5.0, 4.0.0
Description
Follow-up on https://issues.apache.org/jira/browse/SPARK-40646.
I found that JSON parsing is significantly slower due to exception creation in control flow. Also, some fields are not parsed correctly and the exception is thrown in certain cases:
Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct(rows.scala:51)
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct$(rows.scala:51)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:195)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:590)
... 39 more
Attachments
Issue Links
- relates to
-
SPARK-40646 Fix returning partial results in JSON data source and JSON functions
- Resolved
- links to
(2 links to)