Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
0.23.1, 2.0.0-alpha
-
None
-
Reviewed
-
Committed to trunk and branch-0.23. Thanks Jason.
Description
If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file:
2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806
Attachments
Attachments
Issue Links
- is related to
-
MAPREDUCE-3143 Complete aggregation of user-logs spit out by containers onto DFS
- Closed