Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42912

Some cases do not take effect when using OptimizeSkewInRebalancePartitions

    XMLWordPrintableJSON

Details

    • Question
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.3.0
    • None
    • Spark Core
    • None
    • spark3.3.0

    Description

      Questioin:

      When using OptimizeSkewInRebalancePartitions to insert dynamic partitions (three-level partitions) into the hive table (partitions are skewed), it is found that when spark.sql.shuffle.partitions is set to a relatively large value (10000), the written results do not follow the preset advisoryPartitionSizeInBytes Size to file (the skewed partition data is only processed by one task and written into one file), but when I reduce spark.sql.shuffle.partitions (2000), I found that the skewed partition can be optimized according to OptimizeSkewInRebalancePartitions Data is processed in batches and written to a file.

       

      spark aqe config:

      spark.sql.adaptive.coalescePartitions.enabled true
      spark.sql.adaptive.skewedJoin.enabled true
      spark.sql.adaptive.advisoryPartitionSizeInBytes 128M
      spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes 512M
      spark.sql.finalStage.adaptive.coalescePartitions.minPartitionSize 128M
      spark.sql.finalStage.adaptive.coalescePartitions.parallelismFirst false
      spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes  1024M

       

      10000 partitions

       

       

      2000 partition:

       

       

      sql time

       

       

      plan:

       

       

       

      Attachments

        1. image-2023-03-24-11-37-42-289.png
          63 kB
          thomasgx
        2. image-2023-03-24-11-36-54-539.png
          60 kB
          thomasgx
        3. image-2023-03-24-11-34-34-070.png
          108 kB
          thomasgx
        4. image-2023-03-24-11-31-42-564.png
          54 kB
          thomasgx
        5. image-2023-03-24-11-30-42-239.png
          48 kB
          thomasgx

        Activity

          People

            Unassigned Unassigned
            thomasgx thomasgx
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: