Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.6.0
-
None
-
None
-
Reviewed
Description
ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out / index information) when it receives a message.
This should be caching map info across requests, so that the a scan of all directories is not required for each reducer fetching from the same map.
Also, the scan for each map output / index file is performed twice per mapId within a request. In populateHeaders - once in the call to getMapOutputInfo, and then directly in the method.
For an invocation where we do end up with more than 1000 (default) mapIds in a single call, and don't cache them in the map - the path constructed for such entries will be invalid. This is highly unlikely to be the case though, until there's proper caching.
MapOutputInfo info = mapOutputInfoMap.get(mapId); if (info == null) { info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user); }
Attachments
Attachments
Issue Links
- relates to
-
MAPREDUCE-7237 Supports config the shuffle's path cache related parameters
- Resolved