Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47704

JSON parsing fails with "java.lang.ClassCastException: org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData" when spark.sql.json.enablePartialResults is enabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0, 3.5.1
    • 4.0.0
    • SQL

    Description

      When reading the following JSON {"a":[{"key":{"b":0}}]}: 

      val df = spark.read.schema("a array<map<string, struct<b boolean>>>").json(path)

      Spark throws exception: 

      Cause: java.lang.ClassCastException: class org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to class org.apache.spark.sql.catalyst.util.ArrayData (org.apache.spark.sql.catalyst.util.ArrayBasedMapData and org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader 'app')
      at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray(rows.scala:53)
      at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray$(rows.scala:53)
      at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:172)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
      at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
      at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:605)
      at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
      at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.$anonfun$prepareNextFile$1(FileScanRDD.scala:884)
      at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) 

       

      The same happens for map: {"a":{"key":[

      {"b":0}

      ]}} when array and map types are swapped.

      val df = spark.read.schema("a map<string, array<struct<b boolean>>>").json(path) 

       

      This is a corner case that https://issues.apache.org/jira/browse/SPARK-44940 missed.

      Attachments

        Issue Links

          Activity

            People

              ivan.sadikov Ivan Sadikov
              ivan.sadikov Ivan Sadikov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: