Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1073

NM to recognise when it can't spawn process and stop accepting containers

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.1.0-beta
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None
    • Environment:

      OS/X with not enough file handles

      Description

      when creating too many containers with a claimed resource use of 0 RAM or vCores, the NM got to the state where exec() was continually failing -but nothing seemed to recognise this and blacklist the node.

      Something should be noting that all container launches for an app/container are failing and do something. While AMs can/should code this, NM failure is something at the YARN-level

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                1 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated: