Description
The Dataset.map does not respect the nullable fields within the schema.
Test code:
(run on spark-shell 2.1.0):
scala> case class Test(a: Int) defined class Test scala> val ds1 = (Test(10) :: Nil).toDS ds1: org.apache.spark.sql.Dataset[Test] = [a: int] scala> val ds2 = ds1.map(x => Test(x.a)) ds2: org.apache.spark.sql.Dataset[Test] = [a: int] scala> ds1.schema == ds2.schema res65: Boolean = false scala> ds1.schema res62: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,false)) scala> ds2.schema res63: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))
Expected
The ds1 should equal ds2. i.e. the schema should be the same.
Actual
The schema is not equal - the StructField nullable property is true in ds2 and false in ds1.
Attachments
Issue Links
- duplicates
-
SPARK-18284 Scheme of DataFrame generated from RDD is different between master and 2.0
- Resolved