Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46995

Allow AQE coalesce final stage in SQL cached plan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Spark Core, SQL

    Description

      https://github.com/apache/spark/pull/43435 and https://github.com/apache/spark/pull/43760 are fixing a correctness issue which will be triggered when AQE applied on cached query plan, specifically, when AQE coalescing the final result stage of the cached plan.

       

      The current semantic of spark.sql.optimizer.canChangeCachedPlanOutputPartitioning

      (source code):

      • when true, we enable AQE, but disable coalescing final stage (default)
      • when false, we disable AQE

       

      But let’s revisit the semantic of this config: actually for caller the only thing that matters is whether we change the output partitioning of the cached plan. And we should only try to apply AQE if possible.  Thus we want to modify the semantic of spark.sql.optimizer.canChangeCachedPlanOutputPartitioning

      • when true, we enable AQE and allow coalescing final: this might lead to perf regression, because it introduce extra shuffle
      • when false, we enable AQE, but disable coalescing final stage. (this is actually the `true` semantic of old behavior)

      Also, to keep the default behavior unchanged, we might want to flip the default value of spark.sql.optimizer.canChangeCachedPlanOutputPartitioning to `false`

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              liuzq12 Ziqi Liu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: