[TEZ-3462] Task attempt failure during container shutdown loses useful container diagnostics - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.7.1
Fix Version/s: 0.9.0, 0.8.5
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

When a nodemanager kills a task attempt due to excessive memory usage it will send a SIGTERM followed by a SIGKILL. It also sends a useful diagnostic message with the container completion event to the RM which will eventually make it to the AM on a subsequent heartbeat.

However if the JVM shutdown processing causes an error in the task (e.g.: filesystem being closed by shutdown hook) then the task attempt can report a failure before the useful NM diagnostic makes it to the AM. The AM then records some other error as the task failure reason, and by the time the container completion status makes it to the AM it does not associate that error with the task attempt and the useful information is lost.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TEZ-3462.001.patch
06/Oct/16 21:21
5 kB
Eric Badger

Issue Links

relates to

MAPREDUCE-6002 MR task should prevent report error to AM when process is shutting down

Closed

Activity

People

Assignee:: Eric Badger

Reporter:: Jason Darrell Lowe

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 06/Oct/16 13:57

Updated:: 14/Mar/17 03:49

Resolved:: 23/Jan/17 19:23