Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31412

New Adaptive Query Execution in Spark SQL

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      SPARK-9850 proposed the basic idea of adaptive execution in Spark. In DAGScheduler, a new API is added to support submitting a single map stage.  The current implementation of adaptive execution in Spark SQL supports changing the reducer number at runtime. An Exchange coordinator is used to determine the number of post-shuffle partitions for a stage that needs to fetch shuffle data from one or multiple stages. The current implementation adds ExchangeCoordinator while we are adding Exchanges. However there are some limitations. First, it may cause additional shuffles that may decrease the performance. We can see this from EnsureRequirements rule when it adds ExchangeCoordinator.  Secondly, it is not a good idea to add ExchangeCoordinators while we are adding Exchanges because we don’t have a global picture of all shuffle dependencies of a post-shuffle stage. I.e. for 3 tables’ join in a single stage, the same ExchangeCoordinator should be used in three Exchanges but currently two separated ExchangeCoordinator will be added. Thirdly, with the current framework it is not easy to implement other features in adaptive execution flexibly like changing the execution plan and handling skewed join at runtime.

      We'd like to introduce a new way to do adaptive execution in Spark SQL and address the limitations. The idea is described at https://docs.google.com/document/d/1mpVjvQZRAkD-Ggy6-hcjXtBPiQoVbZGe3dLnAKgtJ4k/edit?usp=sharing

        Attachments

          Issue Links

          1.
          The basic framework for the new Adaptive Query Execution Sub-task Resolved Carson Wang
          2.
          Adjust post shuffle partition number in adaptive execution Sub-task Resolved Carson Wang
          3.
          Disable OptimizeSkewJoin rule if introducing additional shuffle. Sub-task Resolved Ke Jia
          4.
          collect the runtime statistics of row count in map stage Sub-task Open Unassigned
          5.
          Optimize skewed join at runtime with new Adaptive Execution Sub-task Resolved Ke Jia
          6.
          add metrics to AQE shuffle reader Sub-task Resolved Wenchen Fan
          7.
          add an individual config for skewed partition threshold Sub-task Resolved Wenchen Fan
          8.
          optimize skew join after shuffle partitions are coalesced Sub-task Resolved Wenchen Fan
          9.
          make skew join split skewed partitions more evenly Sub-task Resolved Wenchen Fan
          10.
          refine AQE config names Sub-task Resolved Wenchen Fan
          11.
          Dynamically reuse subqueries in AQE Sub-task Resolved Wei Xue
          12.
          Add a simple cost check for Adaptive Query Execution Sub-task Resolved Wei Xue
          13.
          Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution Sub-task Resolved Ke Jia
          14.
          improve the splitting of skewed partitions Sub-task Resolved Wenchen Fan
          15.
          Add the user guide for Adaptive Query Execution Sub-task Resolved Ke Jia
          16.
          change the default value of minPartitionNum in AQE Sub-task Resolved Wenchen Fan
          17.
          Avoid changing SMJ to BHJ if the build side has a high ratio of empty partitions Sub-task Resolved Wei Xue
          18.
          Add tree traversal helper for adaptive spark plans Sub-task Resolved Wei Xue
          19.
          Optimize shuffle fetch of contiguous partition IDs Sub-task Resolved Yuanjian Li
          20.
          LocalShuffleReaderExec.outputPartitioning should use the corrected attributes Sub-task Resolved Wenchen Fan
          21.
          Improve the local reader performance by changing the task number from 1 to multi Sub-task Resolved Ke Jia
          22.
          Reading of csv file fails with adaptive execution turned on Sub-task Resolved Wenchen Fan
          23.
          Catch the exception when do materialize in AQE Sub-task Resolved Ke Jia
          24.
          Add adaptive execution context Sub-task Resolved Wei Xue
          25.
          remove ReusedQueryStageExec Sub-task Resolved Wenchen Fan
          26.
          Fix tests when enable Adaptive Query Execution Sub-task Resolved Ke Jia
          27.
          reset the metrics info of AdaptiveSparkPlanExec plan when enable aqe Sub-task Resolved Ke Jia
          28.
          Fix the NoSuchElementException exception when enable AQE with InSubquery use case Sub-task Resolved Ke Jia
          29.
          Fix the subquery metrics showing issue in UI When enable AQE Sub-task Resolved Ke Jia
          30.
          coalesce shuffle reader with splitting shuffle fetch request fails Sub-task Resolved Wenchen Fan
          31.
          AQE should not issue a "not supported" warning for queries being by-passed Sub-task Resolved Wenchen Fan
          32.
          Combine the skewed readers into one in AQE skew join optimizations Sub-task Resolved Wenchen Fan
          33.
          Subqueries should not be AQE-ed if main query is not Sub-task Resolved Wei Xue
          34.
          Turning off AQE in CacheManager is not thread-safe Sub-task Resolved Wei Xue
          35.
          Refactor AQE readers and RDDs Sub-task Resolved Wei Xue
          36.
          Remove the max split config after changing the multi sub joins to multi sub partitions Sub-task Resolved Ke Jia
          37.
          Don't cancel a QueryStageExec when it's already finished Sub-task Resolved wuyi
          38.
          Add config for AQE logging level Sub-task Resolved Wei Xue
          39.
          Make more efficient and clean up AQE update UI code Sub-task Resolved Wei Xue
          40.
          Replace `Array` with `Seq` in AQE `CustomShuffleReaderExec` Sub-task Resolved Wei Xue
          41.
          AQE will use the same SubqueryExec even if subqueryReuseEnabled=false Sub-task In Progress Unassigned
          42.
          NPE in OptimizeSkewedJoin when there's a inputRDD of plan has 0 partition Sub-task Resolved wuyi
          43.
          SQL UI doesn't show write commands of AQE plan Sub-task Resolved Manu Zhang
          44.
          The final AdaptiveSparkPlan event is not marked with `isFinalPlan=true` Sub-task Resolved Manu Zhang
          45.
          Optimize Skewed BroadcastNestedLoopJoin with AE Sub-task In Progress Unassigned

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                cloud_fan Wenchen Fan
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: