Karthik Kambatla, I wasn't clear in my original text. The patches in
YARN-4686 do not break any extra tests. However, while exploring the fixes for those failures, I came across an unnecessary wait in the NodeStatusUpdater thread, NodeStatusUpdaterImpl:850. When a reboot happens, the isStopped variable is set to true, but the thread waits until the next heartbeat. The next heartbeat won't come and so it will wait for a heartbeat timeout. So instead of wasting this time unnecessarily, I added a notify to wake the thread up and let it know to continue in the loop, where it would find that isStopped is set to true.
Adding in this optimization uncovered a race condition in the TestNodeManagerResync test. The test doesn't wait for the NM to completely reboot before it checks for its updated capabilities. The only reason that it worked before is because the unnecessary wait in the NodeStatusUpdater acted as a sleep that masked the race condition.
I'm uploading a patch that removes the unnecessary wait in the NodeStatusUpdater thread and also fixes the race condition in TestNodeManagerResync that it uncovers.