Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.23.3, 2.0.2-alpha
-
None
-
None
Description
We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state
Attachments
Attachments
Issue Links
- duplicates
-
MAPREDUCE-4744 Application Master is running forever when the TaskAttempt gets TA_KILL event at the state SUCCESS_CONTAINER_CLEANUP
- Resolved
-
MAPREDUCE-4745 Application Master is hanging when the TaskImpl gets T_KILL event and completes attempts by the time
- Resolved
- is blocked by
-
MAPREDUCE-4748 Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
- Closed