Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12478

Dataset fields of product types can't be null

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0, 2.0.0
    • 1.6.1, 2.0.0
    • SQL

    Description

      Spark shell snippet for reproduction:

      import sqlContext.implicits._
      
      case class Inner(f: Int)
      case class Outer(i: Inner)
      
      Seq(Outer(null)).toDS().toDF().show()
      Seq(Outer(null)).toDS().show()
      

      Expected output should be:

      +----+
      |   i|
      +----+
      |null|
      +----+
      
      +----+
      |   i|
      +----+
      |null|
      +----+
      

      Actual output:

      +------+
      |     i|
      +------+
      |[null]|
      +------+
      
      java.lang.RuntimeException: Error while decoding: java.lang.RuntimeException: Null value appeared in non-nullable field Inner.f of type scala.Int. If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
      newinstance(class $iwC$$iwC$Outer,if (isnull(input[0, StructType(StructField(f,IntegerType,false))])) null else newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)),false,ObjectType(class $iwC$$iwC$Outer),Some($iwC$$iwC@6ab35ce3))
      +- if (isnull(input[0, StructType(StructField(f,IntegerType,false))])) null else newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0))
         :- isnull(input[0, StructType(StructField(f,IntegerType,false))])
         :  +- input[0, StructType(StructField(f,IntegerType,false))]
         :- null
         +- newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0))
            +- assertnotnull(input[0, StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int)
               +- input[0, StructType(StructField(f,IntegerType,false))].f
                  +- input[0, StructType(StructField(f,IntegerType,false))]
      
              at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:224)
              at org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704)
              at org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704)
              at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
              at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
              at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
              at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
              at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
              at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
              at org.apache.spark.sql.Dataset.collect(Dataset.scala:704)
              at org.apache.spark.sql.Dataset.take(Dataset.scala:725)
              at org.apache.spark.sql.Dataset.showString(Dataset.scala:240)
              at org.apache.spark.sql.Dataset.show(Dataset.scala:230)
              at org.apache.spark.sql.Dataset.show(Dataset.scala:193)
              at org.apache.spark.sql.Dataset.show(Dataset.scala:201)
              at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
              at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
              at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
              at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
              at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
              at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
              at $iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
              at $iwC$$iwC$$iwC.<init>(<console>:50)
              at $iwC$$iwC.<init>(<console>:52)
              at $iwC.<init>(<console>:54)
              at <init>(<console>:56)
              at .<init>(<console>:60)
              at .<clinit>(<console>)
              at .<init>(<console>:7)
              at .<clinit>(<console>)
              at $print(<console>)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:483)
              at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
              at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
              at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
              at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
              at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
              at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
              at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
              at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
              at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
              at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
              at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
              at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
              at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
              at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
              at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
              at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
              at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
              at org.apache.spark.repl.Main$.main(Main.scala:31)
              at org.apache.spark.repl.Main.main(Main.scala)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:483)
              at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
              at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
              at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
              at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
              at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: java.lang.RuntimeException: Null value appeared in non-nullable field Inner.f of type scala.Int. If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
              at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown Source)
              at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:221)
              ... 62 more
      

      We can see that there's an unexpected extra nested row in the first output, which causes the exception below.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            apachespark Apache Spark
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment