Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32056

Repartition by key should support partition coalesce for AQE

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.0
    • 3.1.0
    • SQL
    • None
    • spark release 3.0.0

    Description

      when adaptive query execution is enabled the following expression should support coalescing of partitions:

      dataframe.repartition(col("somecolumn")) 

      currently it does not because it simply calls the repartition implementation where number of partitions is specified:

        def repartition(partitionExprs: Column*): Dataset[T] = {
          repartition(sparkSession.sessionState.conf.numShufflePartitions, partitionExprs: _*)
        }

      and repartition with the number of partitions specified does now allow for coalescing of partitions (since this breaks the user's expectation that it will have the number of partitions specified).

      for more context see the discussion here:

      https://github.com/apache/spark/pull/27986

      a simple test to confirm that repartition by key does not support coalescing of partitions can be added in AdaptiveQueryExecSuite like this (it currently fails):

        test("SPARK-32056 repartition has less partitions for small data when adaptiveExecutionEnabled") {
          Seq(true, false).foreach { enableAQE =>
            withSQLConf(
              SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> enableAQE.toString,
              SQLConf.SHUFFLE_PARTITIONS.key -> "50",
              SQLConf.COALESCE_PARTITIONS_INITIAL_PARTITION_NUM.key -> "50",
              SQLConf.SHUFFLE_PARTITIONS.key -> "50") {
              val partitionsNum = (1 to 10).toDF.repartition($"value")
                .rdd.collectPartitions().length
              if (enableAQE) {
                assert(partitionsNum < 50)
              } else {
                assert(partitionsNum === 50)
              }
            }
          }
        }
      

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            viirya L. C. Hsieh
            koert koert kuipers
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment