[SPARK-2774] Set preferred locations for reduce tasks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.0
Component/s: Spark Core
Labels:
None

Description

Currently we do not set preferred locations for reduce tasks in Spark. This patch proposes setting preferred locations based on the map output sizes and locations tracked by the MapOutputTracker. This is useful in two conditions

1. When you have a small job in a large cluster it can be useful to co-locate map and reduce tasks to avoid going over the network
2. If there is a lot of data skew in the map stage outputs, then it is beneficial to place the reducer close to the largest output.

Attachments

Issue Links

links to

[Github] Pull Request #1697 (shivaram)

[Github] Pull Request #4576 (shivaram)

[Github] Pull Request #6652 (shivaram)

Activity

People

Assignee:: Shivaram Venkataraman

Reporter:: Shivaram Venkataraman

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 31/Jul/14 18:36

Updated:: 10/Jun/15 22:07

Resolved:: 10/Jun/15 22:07