Description
I constructed a DataFrame with a nullable java.lang.Double column (and an extra Double column). I then converted it to a Dataset using ```as[(Double, Double)]```. When the Dataset is shown, it has a null. When it is collected and printed, the null is silently converted to a -1.
Code snippet to reproduce this:
val localSpark = spark import localSpark.implicits._ val df = Seq[(java.lang.Double, Double)]( (1.0, 2.0), (3.0, 4.0), (Double.NaN, 5.0), (null, 6.0) ).toDF("a", "b") df.show() // OUTPUT 1: has null df.printSchema() val data = df.as[(Double, Double)] data.show() // OUTPUT 2: has null data.collect().foreach(println) // OUTPUT 3: has -1
OUTPUT 1 and 2:
+----+---+
| a| b|
+----+---+
| 1.0|2.0|
| 3.0|4.0|
| NaN|5.0|
|null|6.0|
+----+---+
OUTPUT 3:
(1.0,2.0) (3.0,4.0) (NaN,5.0) (-1.0,6.0)