Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32096

Improve sorting performance for Spark SQL rank window function​

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • SQL
    • None
    • Any environment that supports Spark.

    Description

      Spark SQL rank window function needs to sort the data in each window partition, and it relies on the execution operator SortExecto do the sort. During sorting, the window partition key is also put at the front of the sort order and thus it brings unnecessary comparisons on the partition key. Instead, we can group the rows by partition key first, and inside each group we sort the rows without comparing the partition key.​ 

       

      The Jira https://issues.apache.org/jira/browse/SPARK-32947 is a follow-up effort of this improvement.

      Attachments

        Activity

          People

            Unassigned Unassigned
            xuzikun2003 Zikun
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: