Hi Junping Du, Sorry missed the mail. Job is a Map only and has a single Map task. Once Map Attempt and Task is SUCCEEDED, the job transitioned from RUNNING to COMMITTING state. At this point, if the Succeeded Attempt is Killed as part of Container Preemption, a T_ATTEMPT_KILLED is raised and the task transitioned from Succeeded to Scheduled, TaskImpl#RetroactiveKilledTransition tells the job about the rescheduling by raising both JOB_TASK_ATTEMPT_COMPLETED and JOB_MAP_TASK_RESCHEDULED. The job now will receive both these events at COMMITTING state and fails as the transition is not handled.
Looks like the fix can ignore the JOB_TASK_ATTEMPT_COMPLETED but not JOB_MAP_TASK_RESCHEDULED instead move the COMMITTING job to RUNNING state again and reschedule the Map Task like below.
addTransition(JobStateInternal.COMMITTING, JobStateInternal.RUNNING, JobEventType.JOB_MAP_TASK_RESCHEDULED, new MapTaskRescheduledTransition())
Please share your comments.