Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37063 SQL Adaptive Query Execution QA: Phase 2
  3. SPARK-37357

Add small partition factor for rebalance partitions

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      For example `Rebalance` provide a functionality that split the large reduce partition into smalls. However we have seen many SQL produce small files due to the last partition.

      Let's say we have one reduce partition and six map partitions and the blocks are:
      [10, 10, 10, 10, 10, 10]
      If the target size is 50, we will get two files with 50 and 10. And it will get worse if there are thousands of reduce partitions.

      It should be helpful if we can control the min partition size.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ulysses XiDuo You
            ulysses XiDuo You
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment