Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50257

[Core]If a stage contains ExpandExec, the CoalesceShufflePartitions rule will not be adjusted during the AQE phase

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • 4.0.0
    • Spark Core
    • None

    Description

      【sql】

      // code placeholder
      SELECT
             /*+ SHUFFLE_MERGE(b) */
             s_date,
             sum(s_quantity * i_price) AS total_sales
         FROM
             sales a
             JOIN items b ON s_item_id = i_item_id
         WHERE
             i_price < 10
         GROUP BY
             s_date with rollup;
       

      Set spark.sql.shuffle.partitions=1000

      After aqe:

      The parallel reads in the ExpandExecut phase have been adjusted to 71, reducing parallelism. The ExpandExecut phase can lead to data expansion, and a decrease in parallelism can result in longer task execution times.

      If AGE is turned off as a whole, AQE optimization cannot be enjoyed in other stages. If it is found that ExpandExec is included in the current stage, partition merging will not be performed for this issue.

       

      Attachments

        1. 截屏2024-11-07 13.52.45.png
          369 kB
          guihuawen

        Activity

          People

            Unassigned Unassigned
            guihuawen guihuawen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: