Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6474

ShuffleHandler can possibly exhaust nodemanager file descriptors



    • Target Version/s:
    • Hadoop Flags:


      The async nature of the shufflehandler can cause it to open a huge number of
      file descriptors, when it runs out it crashes.

      Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
      Let's say all 6K reduces hit a node at about same time asking for their
      outputs. Each reducer will ask for all 40 map outputs over a single socket in a
      single request (not necessarily all 40 at once, but with coalescing it is
      likely to be a large number).

      sendMapOutput() will open the file for random reading and then perform an async transfer of the particular portion of this file(). This will theoretically
      happen 6000*40=240000 times which will run the NM out of file descriptors and cause it to crash.

      The algorithm should be refactored a little to not open the fds until they're
      actually needed.


        1. YARN-2410-v9.patch
          15 kB
          Kuhu Shukla
        2. YARN-2410-v8.patch
          14 kB
          Kuhu Shukla
        3. YARN-2410-v7.patch
          15 kB
          Kuhu Shukla
        4. YARN-2410-v6.patch
          16 kB
          Kuhu Shukla
        5. YARN-2410-v5.patch
          16 kB
          Kuhu Shukla
        6. YARN-2410-v4.patch
          11 kB
          Kuhu Shukla
        7. YARN-2410-v3.patch
          18 kB
          Kuhu Shukla
        8. YARN-2410-v2.patch
          18 kB
          Kuhu Shukla
        9. YARN-2410-v11.patch
          16 kB
          Kuhu Shukla
        10. YARN-2410-v10.patch
          15 kB
          Kuhu Shukla
        11. YARN-2410-v1.patch
          18 kB
          Kuhu Shukla

          Issue Links



              • Assignee:
                kshukla Kuhu Shukla
                nroberts Nathan Roberts
              • Votes:
                0 Vote for this issue
                13 Start watching this issue


                • Created: