[SPARK-21509] Add a config to enable adaptive query execution only for the last query execution. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Feature of adaptive query execution is a good way to avoid generating too many small files on HDFS, like mentioned in ~~SPARK-16188~~.
When feature of adaptive query execution is enabled, all shuffles will be coordinated. The drawbacks:
1. It's hard to balance the num of reducers(this decides the processing speed) and file size on HDFS
2. It generates some unnecessary shuffles(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L101)
3. It generates lots of jobs, which have extra cost for scheduling.
We can add a config and enable adaptive query execution only for the last shuffle.

Attachments

Issue Links

links to

[Github] Pull Request #18713 (jinxing64)

Activity

People

Assignee:: Unassigned

Reporter:: Jin Xing

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Jul/17 14:25

Updated:: 28/Jul/17 04:08

Resolved:: 28/Jul/17 04:08