Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3738

NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.23.1, 2.0.0-alpha
    • 0.23.2
    • mrv2, nodemanager
    • None
    • Reviewed
    • Committed to trunk and branch-0.23. Thanks Jason.

    Description

      If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file:

      2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806

      Attachments

        1. livehistdump.txt
          142 kB
          Jason Darrell Lowe
        2. MAPREDUCE-3738.patch
          5 kB
          Jason Darrell Lowe

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jlowe Jason Darrell Lowe
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment