Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20866

Dataset map does not respect nullable field

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • SQL
    • None

    Description

      The Dataset.map does not respect the nullable fields within the schema.

      Test code:
      (run on spark-shell 2.1.0):

      scala> case class Test(a: Int)
      defined class Test
      
      scala> val ds1 = (Test(10) :: Nil).toDS
      ds1: org.apache.spark.sql.Dataset[Test] = [a: int]
      
      scala> val ds2 = ds1.map(x => Test(x.a))
      ds2: org.apache.spark.sql.Dataset[Test] = [a: int]
      
      scala> ds1.schema == ds2.schema
      res65: Boolean = false
      
      scala> ds1.schema
      res62: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,false))
      
      scala> ds2.schema
      res63: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))
      

      Expected
      The ds1 should equal ds2. i.e. the schema should be the same.

      Actual
      The schema is not equal - the StructField nullable property is true in ds2 and false in ds1.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              colinbreame Colin Breame
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: