Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.2.0
-
None
-
None
Description
I know this is an odd corner case, but NaN != NaN in pivot which is inconsistent with other places in Spark.
scala> Seq(Double.NaN, Double.NaN, 1.0, Double.NaN, 1.0, 1.0).toDF.groupBy("value").count.show() +-----+-----+ |value|count| +-----+-----+ | NaN| 3| | 1.0| 3| +-----+-----+ scala> Seq(Double.NaN, Double.NaN, 1.0, Double.NaN, 1.0, 1.0).toDF.groupBy("value").pivot("value").count.show() +-----+----+----+ |value| 1.0| NaN| +-----+----+----+ | NaN|null|null| | 1.0| 3|null| +-----+----+----+
It looks like the issue is that in PivotFirst if the pivotColumn is an AtomicType a HashMap is used, but for other types a TreeMap is used with an interpretedOrdering. If we made DoubleType and FloatType use the TreeMap then the equality checks would be correct. But I am not able to really test it because if I try to pivot on an array or struct I get analysis exceptions.
scala> Seq(Double.NaN, Double.NaN, 1.0, Double.NaN, 1.0, 1.0).toDF.selectExpr("value", "struct(value) as ar_value").groupBy("value").pivot("ar_value").count.show() java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema [1.0] at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:182) at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:101)
scala> Seq(Double.NaN, Double.NaN, 1.0, Double.NaN, 1.0, 1.0).toDF.selectExpr("value", "array(value) as ar_value").groupBy("value").pivot("ar_value").count.show() org.apache.spark.sql.AnalysisException: Invalid pivot value '[1.0]': value data type array<double> does not match pivot column data type array<double> at org.apache.spark.sql.errors.QueryCompilationErrors$.pivotValDataTypeMismatchError(QueryCompilationErrors.scala:85) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot$$anonfun$apply$10.$anonfun$applyOrElse$21(Analyzer.scala:762) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)