[SPARK-20866] Dataset map does not respect nullable field - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
None

Description

The Dataset.map does not respect the nullable fields within the schema.

Test code:
(run on spark-shell 2.1.0):

scala> case class Test(a: Int)
defined class Test

scala> val ds1 = (Test(10) :: Nil).toDS
ds1: org.apache.spark.sql.Dataset[Test] = [a: int]

scala> val ds2 = ds1.map(x => Test(x.a))
ds2: org.apache.spark.sql.Dataset[Test] = [a: int]

scala> ds1.schema == ds2.schema
res65: Boolean = false

scala> ds1.schema
res62: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,false))

scala> ds2.schema
res63: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))

Expected
The ds1 should equal ds2. i.e. the schema should be the same.

Actual
The schema is not equal - the StructField nullable property is true in ds2 and false in ds1.

Attachments

Issue Links

duplicates

SPARK-18284 Scheme of DataFrame generated from RDD is different between master and 2.0

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Colin Breame

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/May/17 12:41

Updated:: 25/May/17 07:37

Resolved:: 25/May/17 07:37