Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-914

(Umbrella) Support graceful decommission of nodemanager

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.4-alpha
    • None
    • graceful
    • None

    Description

      When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications.

      Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well.

      We propose to introduce a mechanism to optionally gracefully decommission a node manager.

      Attachments

        Issue Links

          1.
          RM to inform AMs when a container completed due to NM going offline -planned or unplanned Sub-task Resolved Rohith Sharma K S
          2.
          RMNode State Transition Update with DECOMMISSIONING state Sub-task Resolved Junping Du
          3.
          Resource update during NM graceful decommission Sub-task Resolved Brook Zhou
          4.
          Notify AM with containers (on decommissioning node) could be preempted after timeout. Sub-task Resolved Unassigned
          5.
          New parameter or CLI for decommissioning node gracefully in RMAdmin CLI Sub-task Resolved Devaraj Kavali
          6.
          Automatic and Asynchronous Decommissioning Nodes Status Tracking Sub-task Resolved Daniel Zhi
          7.
          UI changes for decommissioning node Sub-task Resolved Sunil G
          8.
          RMNodeResourceUpdateEvent update from scheduler can lead to race condition Sub-task Resolved Wilfred Spiegelenburg
          9.
          Document graceful decommission CLI and usage Sub-task Resolved Marton Elek
          10.
          Add -client|server argument for graceful decom Sub-task Resolved Robert Kanter
          11.
          Server-Side NM Graceful Decommissioning with RM HA Sub-task Patch Available Gergely Pollák
          12.
          Server-Side NM Graceful Decommissioning subsequent call behavior Sub-task Open Unassigned
          13.
          Multiple format support (JSON, etc.) for exclude node file in NM graceful decommission with timeout Sub-task Open Unassigned
          14.
          Client-side NM graceful decom is not triggered when jobs finish Sub-task Resolved Robert Kanter
          15.
          Clarify DecommissionType.FORCEFUL comment Sub-task Resolved Vrushali C
          16.
          Document the current known issue with server-side NM graceful decom Sub-task Resolved Robert Kanter
          17.
          Remove XML excludes file format Sub-task Resolved Robert Kanter
          18.
          Better utilize gracefully decommissioning node managers Sub-task Open Karthik Palaniappan
          19.
          DecommissioningNodesWatcher should get lists of running applications on node from RMNode. Sub-task Resolved Abhishek Modi
          20.
          An easy method to exclude a nodemanager from the yarn cluster cleanly Sub-task Open Unassigned
          21.
          Add option to graceful decommission to not wait for applications Sub-task Patch Available Mikayla Konst
          22.
          RESOURCE_UPDATE event was repeatedly registered in DECOMMISSIONING state Sub-task Resolved yehuanhuan
          23.
          Avoid sending RMNodeResourceupdate event if resource is same Sub-task Resolved Sushil Ks

          Activity

            People

              junping_du Junping Du
              vicaya Luke Lu
              Votes:
              3 Vote for this issue
              Watchers:
              74 Start watching this issue

              Dates

                Created:
                Updated: