[MAPREDUCE-4751] AM stuck in KILL_WAIT for days - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.23.3, 2.0.2-alpha
Fix Version/s: 2.0.3-alpha, 0.23.5
Component/s: None
Labels:
None

Description

We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-4751-20121108.txt
09/Nov/12 01:54
12 kB
Vinod Kumar Vavilapalli
MAPREDUCE-4751-20121109.txt
10/Nov/12 01:59
22 kB
Vinod Kumar Vavilapalli
MR-4751-branch-0.23.txt
12/Nov/12 16:19
22 kB
Robert Joseph Evans
TaskAttemptStateGraph.jpg
23/Oct/12 19:59
417 kB
Ravi Prakash

Issue Links

duplicates

MAPREDUCE-4744 Application Master is running forever when the TaskAttempt gets TA_KILL event at the state SUCCESS_CONTAINER_CLEANUP

Resolved

MAPREDUCE-4745 Application Master is hanging when the TaskImpl gets T_KILL event and completes attempts by the time

Resolved

is blocked by

MAPREDUCE-4748 Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED

Closed

Activity

People

Assignee:: Vinod Kumar Vavilapalli

Reporter:: Ravi Prakash

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 17/Oct/12 20:28

Updated:: 03/Sep/14 23:17

Resolved:: 12/Nov/12 16:57