Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
We could try and experiment with rack-aware scheduling of fetches per-reducer. Given the disparities between in-rack and off-rack bandwidth it could be a improvement to do something along these lines:
if (no. of known map-output locations > than no. of copier threads) { try to schedule 75% of copies off-rack try schedule 25% of copies in-rack }
This could lead to better utilization of both in-rack & switch b/w...
Clearly we want to schedule more cross-switch than in-rack since off-rack copies will take significantly more time; hence the 75-25 split.
Attachments
Issue Links
- is blocked by
-
HADOOP-1266 Remove DatanodeDescriptor dependency from NetworkTopology
- Closed
- is duplicated by
-
MAPREDUCE-2038 Making reduce tasks locality-aware
- Open