Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5168

Reducer can OOM during shuffle because on-disk output stream not released

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.23.7
    • 0.23.8
    • mrv2
    • None
    • Reviewed

    Description

      If a reducer needs to shuffle a map output to disk, it opens an output stream and writes the data to disk. However it does not release the reference to the output stream within the MapOutput, and the output stream can have a 128K buffer attached to it. If enough of these on-disk outputs are queued up waiting to be merged, it can cause the reducer to OOM during the shuffle phase. In one case I saw there were 1200 on-disk outputs queued up to be merged, leading to an extra 150MB of pressure on the heap due to the output stream buffers that were no longer necessary.

      Attachments

        1. MAPREDUCE-5168.patch
          6 kB
          Jason Darrell Lowe
        2. MAPREDUCE-5168-branch-0.23.patch
          5 kB
          Jason Darrell Lowe

        Activity

          People

            jlowe Jason Darrell Lowe
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: