Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-914

(Umbrella) Support graceful decommission of nodemanager

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.4-alpha
    • None
    • graceful
    • None

    Description

      When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications.

      Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well.

      We propose to introduce a mechanism to optionally gracefully decommission a node manager.

      Attachments

        Issue Links

        1.
        RM to inform AMs when a container completed due to NM going offline -planned or unplanned Sub-task Resolved Rohith Sharma K S Actions
        2.
        RMNode State Transition Update with DECOMMISSIONING state Sub-task Resolved Junping Du Actions
        3.
        Resource update during NM graceful decommission Sub-task Resolved Brook Zhou Actions
        4.
        Notify AM with containers (on decommissioning node) could be preempted after timeout. Sub-task Resolved Unassigned Actions
        5.
        New parameter or CLI for decommissioning node gracefully in RMAdmin CLI Sub-task Resolved Devaraj Kavali Actions
        6.
        Automatic and Asynchronous Decommissioning Nodes Status Tracking Sub-task Resolved Daniel Zhi Actions
        7.
        UI changes for decommissioning node Sub-task Resolved Sunil G Actions
        8.
        RMNodeResourceUpdateEvent update from scheduler can lead to race condition Sub-task Resolved Wilfred Spiegelenburg Actions
        9.
        Document graceful decommission CLI and usage Sub-task Resolved Marton Elek Actions
        10.
        Add -client|server argument for graceful decom Sub-task Resolved Robert Kanter Actions
        11.
        Server-Side NM Graceful Decommissioning with RM HA Sub-task Patch Available Gergely Pollák Actions
        12.
        Server-Side NM Graceful Decommissioning subsequent call behavior Sub-task Open Unassigned Actions
        13.
        Multiple format support (JSON, etc.) for exclude node file in NM graceful decommission with timeout Sub-task Open Unassigned Actions
        14.
        Client-side NM graceful decom is not triggered when jobs finish Sub-task Resolved Robert Kanter Actions
        15.
        Clarify DecommissionType.FORCEFUL comment Sub-task Resolved Vrushali C Actions
        16.
        Document the current known issue with server-side NM graceful decom Sub-task Resolved Robert Kanter Actions
        17.
        Remove XML excludes file format Sub-task Resolved Robert Kanter Actions
        18.
        Better utilize gracefully decommissioning node managers Sub-task Open Karthik Palaniappan Actions
        19.
        DecommissioningNodesWatcher should get lists of running applications on node from RMNode. Sub-task Resolved Abhishek Modi Actions
        20.
        An easy method to exclude a nodemanager from the yarn cluster cleanly Sub-task Open Unassigned Actions
        21.
        Add option to graceful decommission to not wait for applications Sub-task Patch Available Mikayla Konst Actions
        22.
        RESOURCE_UPDATE event was repeatedly registered in DECOMMISSIONING state Sub-task Resolved yehuanhuan Actions
        23.
        Avoid sending RMNodeResourceupdate event if resource is same Sub-task Resolved Sushil Ks Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            junping_du Junping Du
            vicaya Luke Lu

            Dates

              Created:
              Updated:

              Slack

                Issue deployment