Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10896

RM fail over is not reporting the nodes DECOMMISSIONED

    XMLWordPrintableJSON

Details

    Description

      Whenever we add the host entries into the exclude file in order to DECOMMISSION the Nodemanager, we would issue the yarn rmadmin -refreshNodes command to transition the nodes from RUNNING to DECOMMISSIONED state. However if the fail over to standby resource manager happens and the exclude file has the list of hosts to be disallowed, then these disallowed nodes are never seen through the Cluster Metrics on the new active resource manager. 

      Whatever host entries that are present in the exclude files are being listed in the Cluster Metrics whenever resource manager is restarted, i.e as part of the service init of NodeListManager , however during fail over this info is lost. Hence this patch tries to set the  DECOMMISSIONED nodes inside the RM Context so that its available through Cluster Metrics whenever we issue the yarn rmadmin -refreshNodes command.

      Attachments

        1. YARN-10896.001.patch
          2 kB
          Sushil Ks

        Issue Links

          Activity

            People

              Sushil-K-S Sushil Ks
              Sushil-K-S Sushil Ks
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: