Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8729

Node status updater thread could be lost after it is restarted

    Details

    • Hadoop Flags:
      Reviewed

      Description

      Today I found a lost NM whose node status updater thread was not exist after this thread restarted. In NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM, isStopped flag is not updated to be false before executing statusUpdater.start(), so that if the thread is immediately started and found isStopped==true, it will exit without any log.

      Key codes in NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM:

       statusUpdater.join();
       registerWithRM();
       statusUpdater = new Thread(statusUpdaterRunnable, "Node Status Updater");
       statusUpdater.start();
       this.isStopped = false;   //this line should be moved before statusUpdater.start();
       LOG.info("NodeStatusUpdater thread is reRegistered and restarted");
      
      

        Attachments

        1. YARN-8729.001.patch
          1 kB
          Tao Yang
        2. YARN-8729.001.patch
          1 kB
          Tao Yang
        3. YARN-8729.002.patch
          1 kB
          Weiwei Yang

          Activity

            People

            • Assignee:
              Tao Yang Tao Yang
              Reporter:
              Tao Yang Tao Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: