Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-64

Map-side sort is hampered by io.sort.record.percent

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: performance, task
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently io.sort.record.percent is a fairly obscure, per-job configurable, expert-level parameter which controls how much accounting space is available for records in the map-side sort buffer (io.sort.mb). Typically values for io.sort.mb (100) and io.sort.record.percent (0.05) imply that we can store ~350,000 records in the buffer before necessitating a sort/combine/spill.

      However for many applications which deal with small records e.g. the world-famous wordcount and it's family this implies we can only use 5-10% of io.sort.mb i.e. (5-10M) before we spill inspite of having much more memory available in the sort-buffer. The word-count for e.g. results in ~12 spills (given hdfs block size of 64M). The presence of a combiner exacerbates the problem by piling serialization/deserialization of records too...

      Sure, jobs can configure io.sort.record.percent, but it's tedious and obscure; we really can do better by getting the framework to automagically pick it by using all available memory (upto io.sort.mb) for either the data or accounting.

        Attachments

        1. M64-0.patch
          80 kB
          Chris Douglas
        2. M64-1.patch
          86 kB
          Chris Douglas
        3. M64-2.patch
          89 kB
          Chris Douglas
        4. M64-3.patch
          93 kB
          Chris Douglas
        5. M64-2i.png
          29 kB
          Chris Douglas
        6. M64-1i.png
          32 kB
          Chris Douglas
        7. M64-0i.png
          30 kB
          Chris Douglas
        8. M64-4.patch
          106 kB
          Chris Douglas
        9. M64-5.patch
          108 kB
          Chris Douglas
        10. M64-6.patch
          108 kB
          Chris Douglas
        11. M64-7.patch
          107 kB
          Chris Douglas
        12. M64-8.patch
          108 kB
          Chris Douglas
        13. M64-9.patch
          106 kB
          Chris Douglas
        14. M64-10.patch
          114 kB
          Chris Douglas

          Activity

            People

            • Assignee:
              chris.douglas Chris Douglas
              Reporter:
              acmurthy Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: