Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Since SPARK-9079 and SPARK-9145, `NaN = NaN` returns true and works well. The only exception case is direct comparison between `Row(Float.NaN)` and `Row(Double.NaN)`. The following is the example: the last two expressions had better be true and List([NaN]) for consistency.
scala> Seq((1d,1f),(Double.NaN,Float.NaN)).toDF("a","b").registerTempTable("tmp") scala> sql("select a,b,a=b from tmp").collect() res1: Array[org.apache.spark.sql.Row] = Array([1.0,1.0,true], [NaN,NaN,true]) scala> val row_a = sql("select a from tmp").collect() row_a: Array[org.apache.spark.sql.Row] = Array([1.0], [NaN]) scala> val row_b = sql("select b from tmp").collect() row_b: Array[org.apache.spark.sql.Row] = Array([1.0], [NaN]) scala> row_a(0) == row_b(0) res2: Boolean = true scala> List(row_a(0),row_b(0)).distinct res3: List[org.apache.spark.sql.Row] = List([1.0]) scala> row_a(1) == row_b(1) res4: Boolean = false scala> List(row_a(1),row_b(1)).distinct res5: List[org.apache.spark.sql.Row] = List([NaN], [NaN])
Please note that the following background truths as of today.
- Double.NaN != Double.NaN (Scala/Java/IEEE Standard)
- Float.NaN != Float.NaN (Scala/Java/IEEE Standard)
- Double.NaN != Float.NaN (Scala/Java/IEEE Standard)
- Row(Double.NaN) == Row(Double.NaN)
- Row(Float.NaN) == Row(Float.NaN)
- Row(Double.NaN) != Row(Float.NaN) <== The problem of this issue.