Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6474

ShuffleHandler can possibly exhaust nodemanager file descriptors

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The async nature of the shufflehandler can cause it to open a huge number of
      file descriptors, when it runs out it crashes.

      Scenario:
      Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
      Let's say all 6K reduces hit a node at about same time asking for their
      outputs. Each reducer will ask for all 40 map outputs over a single socket in a
      single request (not necessarily all 40 at once, but with coalescing it is
      likely to be a large number).

      sendMapOutput() will open the file for random reading and then perform an async transfer of the particular portion of this file(). This will theoretically
      happen 6000*40=240000 times which will run the NM out of file descriptors and cause it to crash.

      The algorithm should be refactored a little to not open the fds until they're
      actually needed.

        Attachments

        1. YARN-2410-v11.patch
          16 kB
          Kuhu Shukla
        2. YARN-2410-v10.patch
          15 kB
          Kuhu Shukla
        3. YARN-2410-v9.patch
          15 kB
          Kuhu Shukla
        4. YARN-2410-v8.patch
          14 kB
          Kuhu Shukla
        5. YARN-2410-v7.patch
          15 kB
          Kuhu Shukla
        6. YARN-2410-v6.patch
          16 kB
          Kuhu Shukla
        7. YARN-2410-v5.patch
          16 kB
          Kuhu Shukla
        8. YARN-2410-v4.patch
          11 kB
          Kuhu Shukla
        9. YARN-2410-v3.patch
          18 kB
          Kuhu Shukla
        10. YARN-2410-v2.patch
          18 kB
          Kuhu Shukla
        11. YARN-2410-v1.patch
          18 kB
          Kuhu Shukla

          Issue Links

            Activity

              People

              • Assignee:
                kshukla Kuhu Shukla
                Reporter:
                nroberts Nathan Roberts
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: