Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce

Map-side sort is hampered by io.sort.record.percent



    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.21.0
    • performance, task
    • None
    • Reviewed


      Currently io.sort.record.percent is a fairly obscure, per-job configurable, expert-level parameter which controls how much accounting space is available for records in the map-side sort buffer (io.sort.mb). Typically values for io.sort.mb (100) and io.sort.record.percent (0.05) imply that we can store ~350,000 records in the buffer before necessitating a sort/combine/spill.

      However for many applications which deal with small records e.g. the world-famous wordcount and it's family this implies we can only use 5-10% of io.sort.mb i.e. (5-10M) before we spill inspite of having much more memory available in the sort-buffer. The word-count for e.g. results in ~12 spills (given hdfs block size of 64M). The presence of a combiner exacerbates the problem by piling serialization/deserialization of records too...

      Sure, jobs can configure io.sort.record.percent, but it's tedious and obscure; we really can do better by getting the framework to automagically pick it by using all available memory (upto io.sort.mb) for either the data or accounting.


        1. M64-0.patch
          80 kB
          Christopher Douglas
        2. M64-1.patch
          86 kB
          Christopher Douglas
        3. M64-2.patch
          89 kB
          Christopher Douglas
        4. M64-3.patch
          93 kB
          Christopher Douglas
        5. M64-2i.png
          29 kB
          Christopher Douglas
        6. M64-1i.png
          32 kB
          Christopher Douglas
        7. M64-0i.png
          30 kB
          Christopher Douglas
        8. M64-4.patch
          106 kB
          Christopher Douglas
        9. M64-5.patch
          108 kB
          Christopher Douglas
        10. M64-6.patch
          108 kB
          Christopher Douglas
        11. M64-7.patch
          107 kB
          Christopher Douglas
        12. M64-8.patch
          108 kB
          Christopher Douglas
        13. M64-9.patch
          106 kB
          Christopher Douglas
        14. M64-10.patch
          114 kB
          Christopher Douglas



            cdouglas Christopher Douglas
            acmurthy Arun Murthy
            0 Vote for this issue
            24 Start watching this issue