[SPARK-2387] Remove the stage barrier for better resource utilization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: Scheduler, Spark Core
Labels:
- bulk-closed

Description

DAGScheduler divides a Spark job into multiple stages according to RDD dependencies. Whenever there’s a shuffle dependency, DAGScheduler creates a shuffle map stage on the map side, and another stage depending on that stage.
Currently, the downstream stage cannot start until all its depended stages have finished. This barrier between stages leads to idle slots when waiting for the last few upstream tasks to finish and thus wasting cluster resources.
Therefore we propose to remove the barrier and pre-start the reduce stage once there're free slots. This can achieve better resource utilization and improve the overall job performance, especially when there're lots of executors granted to the application.

Attachments

Issue Links

is depended upon by

SPARK-3145 Hive on Spark umbrella

Resolved

relates to

SPARK-20178 Improve Scheduler fetch failures

Resolved

links to

[Github] Pull Request #1328 (lirui-intel)

[Github] Pull Request #3430 (lianhuiwang)

Proof of concept patch

Activity

People

Assignee:: Unassigned

Reporter:: Rui Li

Votes:: 3 Vote for this issue

Watchers:: 26 Start watching this issue

Dates

Created:: 07/Jul/14 11:27

Updated:: 25/May/21 01:52

Resolved:: 25/May/21 01:38