Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21509

Add a config to enable adaptive query execution only for the last query execution.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 2.2.0
    • None
    • SQL
    • None

    Description

      Feature of adaptive query execution is a good way to avoid generating too many small files on HDFS, like mentioned in SPARK-16188.
      When feature of adaptive query execution is enabled, all shuffles will be coordinated. The drawbacks:
      1. It's hard to balance the num of reducers(this decides the processing speed) and file size on HDFS
      2. It generates some unnecessary shuffles(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L101)
      3. It generates lots of jobs, which have extra cost for scheduling.
      We can add a config and enable adaptive query execution only for the last shuffle.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jinxing6042@126.com Jin Xing
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: