[SPARK-23835] When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.1, 2.4.0
Component/s: SQL
Labels:
None

Description

I constructed a DataFrame with a nullable java.lang.Double column (and an extra Double column). I then converted it to a Dataset using ```as[(Double, Double)]```. When the Dataset is shown, it has a null. When it is collected and printed, the null is silently converted to a -1.

Code snippet to reproduce this:

val localSpark = spark
import localSpark.implicits._
val df = Seq[(java.lang.Double, Double)](
  (1.0, 2.0),
  (3.0, 4.0),
  (Double.NaN, 5.0),
  (null, 6.0)
).toDF("a", "b")
df.show()  // OUTPUT 1: has null

df.printSchema()
val data = df.as[(Double, Double)]
data.show()  // OUTPUT 2: has null
data.collect().foreach(println)  // OUTPUT 3: has -1

OUTPUT 1 and 2:

+----+---+
|   a|  b|
+----+---+
| 1.0|2.0|
| 3.0|4.0|
| NaN|5.0|
|null|6.0|
+----+---+

OUTPUT 3:

(1.0,2.0)
(3.0,4.0)
(NaN,5.0)
(-1.0,6.0)

Attachments

Issue Links

links to

[Github] Pull Request #20976 (mgaido91)

Activity

People

Assignee:: Marco Gaido

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 30/Mar/18 18:22

Updated:: 17/Apr/18 13:48

Resolved:: 17/Apr/18 13:48