Details
Description
Today I found a lost NM whose node status updater thread was not exist after this thread restarted. In NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM, isStopped flag is not updated to be false before executing statusUpdater.start(), so that if the thread is immediately started and found isStopped==true, it will exit without any log.
Key codes in NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM:
statusUpdater.join(); registerWithRM(); statusUpdater = new Thread(statusUpdaterRunnable, "Node Status Updater"); statusUpdater.start(); this.isStopped = false; //this line should be moved before statusUpdater.start(); LOG.info("NodeStatusUpdater thread is reRegistered and restarted");