Details
-
Epic
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Query planning is one of the main factors in high performance, but the current Spark engine requires the execution DAG for a job to be set in advance. Even with costÂ-based optimization, it is hard to know the behavior of data and user-defined functions well enough to always get great execution plans. This JIRA proposes to add adaptive query execution, so that the engine can change the plan for each query as it sees what data earlier stages produced.
We propose adding this to Spark SQL / DataFrames first, using a new API in the Spark engine that lets libraries run DAGs adaptively. In future JIRAs, the functionality could be extended to other libraries or the RDD API, but that is more difficult than adding it in SQL.
I've attached a design doc by Yin Huai and myself explaining how it would work in more detail.
Attachments
Attachments
Issue Links
- is a parent of
-
SPARK-28231 Do not modify the number of partitions set by the user when using repartition
- Open
- is duplicated by
-
SPARK-4630 Dynamically determine optimal number of partitions
- Closed
- is related to
-
SPARK-23128 The basic framework for the new Adaptive Query Execution
- Resolved
-
SPARK-31412 New Adaptive Query Execution in Spark SQL
- Closed
1.
|
Support submitting map stages individually in DAGScheduler | Resolved | Matei Alexandru Zaharia | ||
2.
|
Let reduce tasks fetch multiple map output partitions | Resolved | Matei Alexandru Zaharia | ||
3.
|
Introduce an ExchangeCoordinator to estimate the number of post-shuffle partitions. | Resolved | Yin Huai | ||
4.
|
Aggregation: Determine the number of reducers at runtime | Resolved | Yin Huai | ||
5.
|
Join: Determine the join strategy (broadcast join or shuffle join) at runtime | Resolved | Unassigned | ||
6.
|
Join: Determine the number of reducers used by a shuffle join operator at runtime | Resolved | Yin Huai | ||
7.
|
Join: Handling data skew | Resolved | Unassigned | ||
8.
|
Improve the way that we plan shuffled join | Resolved | Unassigned | ||
9.
|
Allow shuffle readers to request data from just one mapper | Resolved | Unassigned |