Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8729

Node status updater thread could be lost after it is restarted

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Today I found a lost NM whose node status updater thread was not exist after this thread restarted. In NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM, isStopped flag is not updated to be false before executing statusUpdater.start(), so that if the thread is immediately started and found isStopped==true, it will exit without any log.

      Key codes in NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM:

       statusUpdater.join();
       registerWithRM();
       statusUpdater = new Thread(statusUpdaterRunnable, "Node Status Updater");
       statusUpdater.start();
       this.isStopped = false;   //this line should be moved before statusUpdater.start();
       LOG.info("NodeStatusUpdater thread is reRegistered and restarted");
      
      

      Attachments

        1. YARN-8729.002.patch
          1 kB
          Weiwei Yang
        2. YARN-8729.001.patch
          1 kB
          Tao Yang
        3. YARN-8729.001.patch
          1 kB
          Tao Yang

        Activity

          People

            Tao Yang Tao Yang
            Tao Yang Tao Yang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: