[SPARK-42912] Some cases do not take effect when using OptimizeSkewInRebalancePartitions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Question
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None
Environment:

spark3.3.0

Description

Questioin:

When using OptimizeSkewInRebalancePartitions to insert dynamic partitions (three-level partitions) into the hive table (partitions are skewed), it is found that when spark.sql.shuffle.partitions is set to a relatively large value (10000), the written results do not follow the preset advisoryPartitionSizeInBytes Size to file (the skewed partition data is only processed by one task and written into one file), but when I reduce spark.sql.shuffle.partitions (2000), I found that the skewed partition can be optimized according to OptimizeSkewInRebalancePartitions Data is processed in batches and written to a file.

spark aqe config:

spark.sql.adaptive.coalescePartitions.enabled true
spark.sql.adaptive.skewedJoin.enabled true
spark.sql.adaptive.advisoryPartitionSizeInBytes 128M
spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes 512M
spark.sql.finalStage.adaptive.coalescePartitions.minPartitionSize 128M
spark.sql.finalStage.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes 1024M

10000 partitions