Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-268

Implement memory-to-memory merge in the reduce

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      HADOOP-3446 fixed the reduce to not flush the in-memory shuffled map-outputs before feeding to the reduce. However for latency-sensitive applications with lots of memory like the terasort this hurts performance since the fan-in for the final in-memory merge is too large (all 8000 map-outputs very in-memory) resulting in less than optimal performance.

      When I put in an intermediate memory-to-memory merge for the terasort's reduce (there-by avoiding disk i/o) to cut the fan-in from 8000 to <100 the 'reduce' phase (including the local datanode-write) sped-up 250% (from 10s to 4s).

      Attachments

        Issue Links

          Activity

            People

              acmurthy Arun Murthy
              acmurthy Arun Murthy
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: