[MAPREDUCE-3738] NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.23.1, 2.0.0-alpha
Fix Version/s: 0.23.2
Component/s: mrv2, nodemanager
Labels:
None

Target Version/s:

0.23.2
Hadoop Flags:

Reviewed
Release Note:
Committed to trunk and branch-0.23. Thanks Jason.

Description

If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file:

2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-3738.patch
23/Feb/12 20:26
5 kB
Jason Darrell Lowe
livehistdump.txt
26/Jan/12 22:57
142 kB
Jason Darrell Lowe

Issue Links

is related to

MAPREDUCE-3143 Complete aggregation of user-logs spit out by containers onto DFS

Closed

Activity

People

Assignee:: Jason Darrell Lowe

Reporter:: Jason Darrell Lowe

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 26/Jan/12 20:57

Updated:: 10/Mar/15 04:31

Resolved:: 24/Feb/12 02:14