Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32096

Improve sorting performance for Spark SQL rank window function​

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • SQL
    • None
    • Any environment that supports Spark.

    Description

      Spark SQL rank window function needs to sort the data in each window partition, and it relies on the execution operator SortExecto do the sort. During sorting, the window partition key is also put at the front of the sort order and thus it brings unnecessary comparisons on the partition key. Instead, we can group the rows by partition key first, and inside each group we sort the rows without comparing the partition key.​ 

       

      The Jira https://issues.apache.org/jira/browse/SPARK-32947 is a follow-up effort of this improvement.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            xuzikun2003 Zikun

            Dates

              Created:
              Updated:

              Slack

                Issue deployment