Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35653

[SQL] CatalystToExternalMap interpreted path fails for Map with case classes as keys or values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.2, 3.1.2, 3.2.0
    • 3.0.3, 3.2.0, 3.1.3
    • SQL
    • None

    Description

      Interpreted path deserialization fails for Map with case classes as keys or values while the codegen path works correctly.

      To reproduce the issue one can add test cases to the ExpressionEncoderSuite. For example adding the following

      case class IntAndString(i: Int, s: String)
      encodeDecodeTest(Map(1 -> IntAndString(1, "a")), "map with case class as value")
      

      It will succeed for the code gen path while the interpreted path will fail with

      [info] - encode/decode for map with case class as value: Map(1 -> IntAndString(1,a)) (interpreted path) *** FAILED *** (64 milliseconds)
      [info] Encoded/Decoded data does not match input data
      [info]
      [info] in: Map(1 -> IntAndString(1,a))
      [info] out: Map(1 -> [1,a])
      [info] types: scala.collection.immutable.Map$Map1 [info]
      [info] Encoded Data: [org.apache.spark.sql.catalyst.expressions.UnsafeMapData@5ecf5d9e]
      [info] Schema: value#823
      [info] root
      [info] -- value: map (nullable = true)
      [info] |-- key: integer
      [info] |-- value: struct (valueContainsNull = true)
      [info] | |-- i: integer (nullable = false)
      [info] | |-- s: string (nullable = true)
      [info]
      [info]
      [info] fromRow Expressions:
      [info] catalysttoexternalmap(lambdavariable(CatalystToExternalMap_key, IntegerType, false, 178), lambdavariable(CatalystToExternalMap_key, IntegerType, false, 178), lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179), if (isnull(lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179))) null else newInstance(class org.apache.spark.sql.catalyst.encoders.IntAndString), input[0, map<int,struct<i:int,s:string>>, true], interface scala.collection.immutable.Map
      [info] :- lambdavariable(CatalystToExternalMap_key, IntegerType, false, 178)
      [info] :- lambdavariable(CatalystToExternalMap_key, IntegerType, false, 178)
      [info] :- lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179)
      [info] :- if (isnull(lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179))) null else newInstance(class org.apache.spark.sql.catalyst.encoders.IntAndString)
      [info] : :- isnull(lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179))
      [info] : : +- lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179)
      [info] : :- null
      [info] : +- newInstance(class org.apache.spark.sql.catalyst.encoders.IntAndString)
      [info] : :- assertnotnull(lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179).i)
      [info] : : +- lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179).i
      [info] : : +- lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179)
      [info] : +- lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179).s.toString
      [info] : +- lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179).s
      [info] : +- lambdavariable(CatalystToExternalMap_value, StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179)
      [info] +- input[0, map<int,struct<i:int,s:string>>, true] (ExpressionEncoderSuite.scala:627)
      

      So the value was not correctly deserialized in the interpreted path.

      I have prepared a PR that I will submit for fixing this issue.

      Attachments

        Activity

          People

            eejbyfeldt Emil Ejbyfeldt
            eejbyfeldt Emil Ejbyfeldt
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: