Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26984

Incompatibility between Spark releases - Some(null)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 2.4.0
    • None
    • Spark Core
    • Linux CentOS, Databricks.

    Description

      Please refer to https://stackoverflow.com/questions/54851205/why-does-somenull-throw-nullpointerexception-in-spark-2-4-but-worked-in-2-2/54861152#54861152.

      NB: Not sure of priority being correct - no doubt one will evaluate.

      It is noted that the following:

      val df = Seq(
        (1, Some("a"), Some(1)),
        (2, Some(null), Some(2)),
        (3, Some("c"), Some(3)),
        (4, None, None)).toDF("c1", "c2", "c3")}
      

      In Spark 2.2.1 (on mapr) the Some(null) works fine, in Spark 2.4.0 on Databricks an error ensues.

      java.lang.RuntimeException: Error while encoding: java.lang.NullPointerException assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._1 AS _1#6 staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, unwrapoption(ObjectType(class java.lang.String), assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._2), true, false) AS _2#7 unwrapoption(IntegerType, assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._3) AS _3#8 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:293) at org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:472) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) at scala.collection.immutable.List.foreach(List.scala:388) at scala.collection.TraversableLike.map(TraversableLike.scala:233) at scala.collection.TraversableLike.map$(TraversableLike.scala:226) at scala.collection.immutable.List.map(List.scala:294) at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:472) at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:377) at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:228) ... 57 elided Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:289) ... 66 more
      

      You can argue it is solvable otherwise, but there may well be an existing code base that could be affected.

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            thebluephantom Gerard Alexander
            Jacek Laskowski Jacek Laskowski
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: