Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      YARN-4676 implements an automatic, asynchronous and flexible mechanism to graceful decommission
      YARN nodes. After user issues the refreshNodes request, ResourceManager automatically evaluates
      status of all affected nodes to kicks out decommission or recommission actions. RM asynchronously
      tracks container and application status related to DECOMMISSIONING nodes to decommission the
      nodes immediately after there are ready to be decommissioned. Decommissioning timeout at individual
      nodes granularity is supported and could be dynamically updated. The mechanism naturally supports multiple
      independent graceful decommissioning “sessions” where each one involves different sets of nodes with
      different timeout settings. Such support is ideal and necessary for graceful decommission request issued
      by external cluster management software instead of human.

      DecommissioningNodeWatcher inside ResourceTrackingService tracks DECOMMISSIONING nodes status automatically and asynchronously after client/admin made the graceful decommission request. It tracks DECOMMISSIONING nodes status to decide when, after all running containers on the node have completed, will be transitioned into DECOMMISSIONED state. NodesListManager detect and handle include and exclude list changes to kick out decommission or recommission as necessary.

        Attachments

        1. YARN-4676.024.patch
          110 kB
          Daniel Zhi
        2. YARN-4676.023.patch
          108 kB
          Daniel Zhi
        3. YARN-4676.022.patch
          102 kB
          Daniel Zhi
        4. YARN-4676.021.patch
          99 kB
          Daniel Zhi
        5. YARN-4676.020.patch
          103 kB
          Daniel Zhi
        6. YARN-4676.019.patch
          103 kB
          Daniel Zhi
        7. YARN-4676.018.patch
          103 kB
          Daniel Zhi
        8. YARN-4676.017.patch
          100 kB
          Daniel Zhi
        9. YARN-4676.016.patch
          100 kB
          Daniel Zhi
        10. YARN-4676.015.patch
          101 kB
          Daniel Zhi
        11. YARN-4676.014.patch
          98 kB
          Daniel Zhi
        12. YARN-4676.013.patch
          94 kB
          Daniel Zhi
        13. YARN-4676.012.patch
          94 kB
          Daniel Zhi
        14. YARN-4676.011.patch
          92 kB
          Daniel Zhi
        15. YARN-4676.010.patch
          84 kB
          Daniel Zhi
        16. YARN-4676.009.patch
          78 kB
          Daniel Zhi
        17. YARN-4676.008.patch
          78 kB
          Daniel Zhi
        18. YARN-4676.007.patch
          109 kB
          Daniel Zhi
        19. YARN-4676.006.patch
          105 kB
          Daniel Zhi
        20. YARN-4676.005.patch
          104 kB
          Daniel Zhi
        21. YARN-4676.004.patch
          105 kB
          Daniel Zhi
        22. GracefulDecommissionYarnNode.pdf
          372 kB
          Daniel Zhi
        23. GracefulDecommissionYarnNode.pdf
          476 kB
          Daniel Zhi

          Issue Links

            Activity

              People

              • Assignee:
                danzhi Daniel Zhi
                Reporter:
                danzhi Daniel Zhi
              • Votes:
                0 Vote for this issue
                Watchers:
                21 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: