Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
3.0.0-alpha1
-
None
Description
The shuffle costs is expensive in Hadoop in spite of the existence of combiner, because the scope of combining is limited within only one MapTask. To solve this problem, it's a good way to aggregate the result of maps per node/rack by launch combiner.
This JIRA is to implement the multi-level aggregation infrastructure, including combining per container(MAPREDUCE-3902 is related), coordinating containers by application master without breaking fault tolerance of jobs.
Attachments
Attachments
Issue Links
- is related to
-
TAJO-374 Investigate more efficient intermediate shuffle methods
- Resolved