Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
-
Any environment that supports Spark.
Description
Spark SQL rank window function needs to sort the data in each window partition, and it relies on the execution operator SortExec to do the sort. During sorting, the window partition key is also put at the front of the sort order and thus it brings unnecessary comparisons on the partition key. Instead, we can group the rows by partition key first, and inside each group we sort the rows without comparing the partition key.
The Jira https://issues.apache.org/jira/browse/SPARK-32947 is a follow-up effort of this improvement.