Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42912

Some cases do not take effect when using OptimizeSkewInRebalancePartitions

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Question
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.3.0
    • None
    • Spark Core
    • None
    • spark3.3.0

    Description

      Questioin:

      When using OptimizeSkewInRebalancePartitions to insert dynamic partitions (three-level partitions) into the hive table (partitions are skewed), it is found that when spark.sql.shuffle.partitions is set to a relatively large value (10000), the written results do not follow the preset advisoryPartitionSizeInBytes Size to file (the skewed partition data is only processed by one task and written into one file), but when I reduce spark.sql.shuffle.partitions (2000), I found that the skewed partition can be optimized according to OptimizeSkewInRebalancePartitions Data is processed in batches and written to a file.

       

      spark aqe config:

      spark.sql.adaptive.coalescePartitions.enabled true
      spark.sql.adaptive.skewedJoin.enabled true
      spark.sql.adaptive.advisoryPartitionSizeInBytes 128M
      spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes 512M
      spark.sql.finalStage.adaptive.coalescePartitions.minPartitionSize 128M
      spark.sql.finalStage.adaptive.coalescePartitions.parallelismFirst false
      spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes  1024M

       

      10000 partitions

       

       

      2000 partition:

       

       

      sql time

       

       

      plan:

       

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            thomasgx thomasgx
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment