[SPARK-39031] NaN != NaN in pivot - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.2.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

I know this is an odd corner case, but NaN != NaN in pivot which is inconsistent with other places in Spark.


scala> Seq(Double.NaN, Double.NaN, 1.0, Double.NaN, 1.0, 1.0).toDF.groupBy("value").count.show()
+-----+-----+                                                                   
|value|count|
+-----+-----+
|  NaN|    3|
|  1.0|    3|
+-----+-----+


scala> Seq(Double.NaN, Double.NaN, 1.0, Double.NaN, 1.0, 1.0).toDF.groupBy("value").pivot("value").count.show()
+-----+----+----+
|value| 1.0| NaN|
+-----+----+----+
|  NaN|null|null|
|  1.0|   3|null|
+-----+----+----+

It looks like the issue is that in PivotFirst if the pivotColumn is an AtomicType a HashMap is used, but for other types a TreeMap is used with an interpretedOrdering. If we made DoubleType and FloatType use the TreeMap then the equality checks would be correct. But I am not able to really test it because if I try to pivot on an array or struct I get analysis exceptions.

scala> Seq(Double.NaN, Double.NaN, 1.0, Double.NaN, 1.0, 1.0).toDF.selectExpr("value", "struct(value) as ar_value").groupBy("value").pivot("ar_value").count.show()
java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema [1.0]
  at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:182)
  at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:101)

 scala> Seq(Double.NaN, Double.NaN, 1.0, Double.NaN, 1.0, 1.0).toDF.selectExpr("value", "array(value) as ar_value").groupBy("value").pivot("ar_value").count.show()
org.apache.spark.sql.AnalysisException: Invalid pivot value '[1.0]': value data type array<double> does not match pivot column data type array<double>
  at org.apache.spark.sql.errors.QueryCompilationErrors$.pivotValDataTypeMismatchError(QueryCompilationErrors.scala:85)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot$$anonfun$apply$10.$anonfun$applyOrElse$21(Analyzer.scala:762)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Robert Joseph Evans

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Apr/22 20:39

Updated:: 26/Apr/22 20:39