Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5946

Last spill of map task is not necessary for final merge

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.0
    • None
    • performance, task

    Description

      In map task, merge starts only after the last spill is completely written to disk. This is not necessary nor efficient because the last spill should to be reloaded soon for merge, probably immediately because spills are merged in the order of their sizes and the last spill is likely smallest. OS page cache is not the answer due to its opportunistic nature.

      I'm starting to work on this. Please give me your thoughts.

      Attachments

        Activity

          People

            jaehoon13.ko jaehoon ko
            jaehoon13.ko jaehoon ko
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: