[MAPREDUCE-259] Rack-aware Shuffle - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

We could try and experiment with rack-aware scheduling of fetches per-reducer. Given the disparities between in-rack and off-rack bandwidth it could be a improvement to do something along these lines:

if (no. of known map-output locations > than no. of copier threads) {
  try to schedule 75% of copies off-rack
  try schedule 25% of copies in-rack
}

This could lead to better utilization of both in-rack & switch b/w...

Clearly we want to schedule more cross-switch than in-rack since off-rack copies will take significantly more time; hence the 75-25 split.

Attachments

Issue Links

is blocked by

HADOOP-1266 Remove DatanodeDescriptor dependency from NetworkTopology

Closed

is duplicated by

MAPREDUCE-2038 Making reduce tasks locality-aware

Open

Activity

People

Assignee:: Arun Murthy

Reporter:: Arun Murthy

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 16/Aug/07 10:58

Updated:: 17/Jul/14 17:37

Resolved:: 17/Jul/14 17:37