[MAPREDUCE-268] Implement memory-to-memory merge in the reduce - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

~~HADOOP-3446~~ fixed the reduce to not flush the in-memory shuffled map-outputs before feeding to the reduce. However for latency-sensitive applications with lots of memory like the terasort this hurts performance since the fan-in for the final in-memory merge is too large (all 8000 map-outputs very in-memory) resulting in less than optimal performance.

When I put in an intermediate memory-to-memory merge for the terasort's reduce (there-by avoiding disk i/o) to cut the fan-in from 8000 to <100 the 'reduce' phase (including the local datanode-write) sped-up 250% (from 10s to 4s).

Attachments

Issue Links

is part of

MAPREDUCE-318 Refactor reduce shuffle code

Closed

Activity

People

Assignee:: Arun Murthy

Reporter:: Arun Murthy

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 14/May/09 07:31

Updated:: 05/Jun/12 02:37

Resolved:: 14/Sep/09 04:57