There are three threads in JobTracker updating the TaskStatuses : ExpireTrackers, ExpireLaunchingTasks and heartbeat threads.
1. If heartbeat runs first, there is no issue with ExpireTrackers and ExpireLaunchingTasks threads, since the entry will removed from them.
2. If ExpireTrackers runs first, there is no issue with heartbeat(The tracker will be re-inited) and ExpireLaunchingTasks(he task-entry will removed from the thread).
3. If ExpireLaunchingTasks runs first,
a) If ExpireTrackers runs second, the task attempt will be Killed again. The task will have FAILED, followed by KILLED update.
b) If heartbeat runs second, If the TaskStatus is UNASSIGNED, RUNNING, COMMIT_PENDING or SUCCEDED, the task will be added to taskToKill map and the update is ignored. So, the tasks will be sent KillTaskAction(fixed by
HADOOP-5280). If the state is COMMIT_PENDING, it throws NPE as described in this jira.
I propose this issue can fix both 3(a) and 3(b).