[SPARK-31412] New Adaptive Query Execution in Spark SQL - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

SPARK-9850 proposed the basic idea of adaptive execution in Spark. In DAGScheduler, a new API is added to support submitting a single map stage. The current implementation of adaptive execution in Spark SQL supports changing the reducer number at runtime. An Exchange coordinator is used to determine the number of post-shuffle partitions for a stage that needs to fetch shuffle data from one or multiple stages. The current implementation adds ExchangeCoordinator while we are adding Exchanges. However there are some limitations. First, it may cause additional shuffles that may decrease the performance. We can see this from EnsureRequirements rule when it adds ExchangeCoordinator. Secondly, it is not a good idea to add ExchangeCoordinators while we are adding Exchanges because we don’t have a global picture of all shuffle dependencies of a post-shuffle stage. I.e. for 3 tables’ join in a single stage, the same ExchangeCoordinator should be used in three Exchanges but currently two separated ExchangeCoordinator will be added. Thirdly, with the current framework it is not easy to implement other features in adaptive execution flexibly like changing the execution plan and handling skewed join at runtime.

We'd like to introduce a new way to do adaptive execution in Spark SQL and address the limitations. The idea is described at https://docs.google.com/document/d/1mpVjvQZRAkD-Ggy6-hcjXtBPiQoVbZGe3dLnAKgtJ4k/edit?usp=sharing

Attachments

Issue Links

causes

SPARK-33822 TPCDS Q5 fails if spark.sql.adaptive.enabled=true

Resolved

is related to

SPARK-33828 SQL Adaptive Query Execution QA

Closed

relates to

SPARK-9850 Adaptive execution in Spark

Open

Sub-Tasks

1.	The basic framework for the new Adaptive Query Execution	Resolved	Carson Wang
2.	Adjust post shuffle partition number in adaptive execution	Resolved	Carson Wang
3.	Disable OptimizeSkewJoin rule if introducing additional shuffle.	Resolved	Ke Jia
4.	collect the runtime statistics of row count in map stage	Open	Unassigned
5.	Optimize skewed join at runtime with new Adaptive Execution	Resolved	Ke Jia
6.	add metrics to AQE shuffle reader	Resolved	Wenchen Fan
7.	add an individual config for skewed partition threshold	Resolved	Wenchen Fan
8.	optimize skew join after shuffle partitions are coalesced	Resolved	Wenchen Fan
9.	make skew join split skewed partitions more evenly	Resolved	Wenchen Fan
10.	refine AQE config names	Resolved	Wenchen Fan
11.	Dynamically reuse subqueries in AQE	Resolved	Wei Xue
12.	Add a simple cost check for Adaptive Query Execution	Resolved	Wei Xue
13.	Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution	Resolved	Ke Jia
14.	improve the splitting of skewed partitions	Resolved	Wenchen Fan
15.	Add the user guide for Adaptive Query Execution	Resolved	Ke Jia
16.	change the default value of minPartitionNum in AQE	Resolved	Wenchen Fan
17.	Avoid changing SMJ to BHJ if the build side has a high ratio of empty partitions	Resolved	Wei Xue
18.	Add tree traversal helper for adaptive spark plans	Resolved	Wei Xue
19.	Optimize shuffle fetch of contiguous partition IDs	Resolved	Yuanjian Li
20.	LocalShuffleReaderExec.outputPartitioning should use the corrected attributes	Resolved	Wenchen Fan
21.	Improve the local reader performance by changing the task number from 1 to multi	Resolved	Ke Jia
22.	Reading of csv file fails with adaptive execution turned on	Resolved	Wenchen Fan
23.	Catch the exception when do materialize in AQE	Resolved	Ke Jia
24.	Add adaptive execution context	Resolved	Wei Xue
25.	remove ReusedQueryStageExec	Resolved	Wenchen Fan
26.	Fix tests when enable Adaptive Query Execution	Resolved	Ke Jia
27.	reset the metrics info of AdaptiveSparkPlanExec plan when enable aqe	Resolved	Ke Jia
28.	Fix the NoSuchElementException exception when enable AQE with InSubquery use case	Resolved	Ke Jia
29.	Fix the subquery metrics showing issue in UI When enable AQE	Resolved	Ke Jia
30.	coalesce shuffle reader with splitting shuffle fetch request fails	Resolved	Wenchen Fan
31.	AQE should not issue a "not supported" warning for queries being by-passed	Resolved	Wenchen Fan
32.	Combine the skewed readers into one in AQE skew join optimizations	Resolved	Wenchen Fan
33.	Subqueries should not be AQE-ed if main query is not	Resolved	Wei Xue
34.	Turning off AQE in CacheManager is not thread-safe	Resolved	Wei Xue
35.	Refactor AQE readers and RDDs	Resolved	Wei Xue
36.	Remove the max split config after changing the multi sub joins to multi sub partitions	Resolved	Ke Jia
37.	Don't cancel a QueryStageExec when it's already finished	Resolved	wuyi
38.	Add config for AQE logging level	Resolved	Wei Xue
39.	Make more efficient and clean up AQE update UI code	Resolved	Wei Xue
40.	Replace `Array` with `Seq` in AQE `CustomShuffleReaderExec`	Resolved	Wei Xue
41.	AQE will use the same SubqueryExec even if subqueryReuseEnabled=false	Resolved	Unassigned
42.	NPE in OptimizeSkewedJoin when there's a inputRDD of plan has 0 partition	Resolved	wuyi
43.	SQL UI doesn't show write commands of AQE plan	Resolved	Manu Zhang
44.	The final AdaptiveSparkPlan event is not marked with `isFinalPlan=true`	Resolved	Manu Zhang

Activity

People

Assignee:: Unassigned

Reporter:: Wenchen Fan

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 10/Apr/20 08:53

Updated:: 17/Dec/20 18:17

Resolved:: 10/Apr/20 09:32