[SPARK-32056] Repartition by key should support partition coalesce for AQE - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.1.0
Component/s: SQL
Labels:
None
Environment:

spark release 3.0.0

Description

when adaptive query execution is enabled the following expression should support coalescing of partitions:

dataframe.repartition(col("somecolumn"))

currently it does not because it simply calls the repartition implementation where number of partitions is specified:

  def repartition(partitionExprs: Column*): Dataset[T] = {
    repartition(sparkSession.sessionState.conf.numShufflePartitions, partitionExprs: _*)
  }

and repartition with the number of partitions specified does now allow for coalescing of partitions (since this breaks the user's expectation that it will have the number of partitions specified).

for more context see the discussion here:

https://github.com/apache/spark/pull/27986

a simple test to confirm that repartition by key does not support coalescing of partitions can be added in AdaptiveQueryExecSuite like this (it currently fails):

  test("SPARK-32056 repartition has less partitions for small data when adaptiveExecutionEnabled") {
    Seq(true, false).foreach { enableAQE =>
      withSQLConf(
        SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> enableAQE.toString,
        SQLConf.SHUFFLE_PARTITIONS.key -> "50",
        SQLConf.COALESCE_PARTITIONS_INITIAL_PARTITION_NUM.key -> "50",
        SQLConf.SHUFFLE_PARTITIONS.key -> "50") {
        val partitionsNum = (1 to 10).toDF.repartition($"value")
          .rdd.collectPartitions().length
        if (enableAQE) {
          assert(partitionsNum < 50)
        } else {
          assert(partitionsNum === 50)
        }
      }
    }
  }

Attachments

Issue Links

relates to

SPARK-31220 repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum when spark.sql.adaptive.enabled

Resolved

links to

[Github] Pull Request #28900 (viirya)

[Github] Pull Request #28952 (viirya)

GitHub Pull Request #3203

(1 links to)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: koert kuipers

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Jun/20 19:37

Updated:: 23/Oct/21 06:14

Resolved:: 29/Jun/20 11:33