Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.3.3, 3.4.2, 3.5.0
Description
In interpreted mode, ordering by a UDT will result in an exception. For example:
import org.apache.spark.ml.linalg.{DenseVector, Vector} val df = Seq.tabulate(30) { x => (x, x + 1, x + 2, new DenseVector(Array((x/100.0).toDouble, ((x + 1)/100.0).toDouble, ((x + 3)/100.0).toDouble))) }.toDF("id", "c1", "c2", "c3") df.createOrReplaceTempView("df") // this works sql("select * from df order by c3").collect sql("set spark.sql.codegen.wholeStage=false") sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") // this gets an error sql("select * from df order by c3").collect
The second collect action results in the following exception:
org.apache.spark.SparkIllegalArgumentException: Type UninitializedPhysicalType does not support ordered operations. at org.apache.spark.sql.errors.QueryExecutionErrors$.orderedOperationUnsupportedByDataTypeError(QueryExecutionErrors.scala:348) at org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:332) at org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:329) at org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:60) at org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:39) at org.apache.spark.sql.execution.UnsafeExternalRowSorter$RowComparator.compare(UnsafeExternalRowSorter.java:254)
Note: You don't get an error if you use show rather than collect. This is because show will implicitly add a limit, in which case the ordering is performed by TakeOrderedAndProject rather than UnsafeExternalRowSorter.
Attachments
Issue Links
- links to