Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3738

NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.1, 2.0.0-alpha
    • Fix Version/s: 0.23.2
    • Component/s: mrv2, nodemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Committed to trunk and branch-0.23. Thanks Jason.

      Description

      If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file:

      2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806

      1. MAPREDUCE-3738.patch
        5 kB
        Jason Lowe
      2. livehistdump.txt
        142 kB
        Jason Lowe

        Issue Links

          Activity

          Jason Lowe created issue -
          Mahadev konar made changes -
          Field Original Value New Value
          Priority Major [ 3 ] Critical [ 2 ]
          Jason Lowe made changes -
          Attachment livehistdump.txt [ 12512051 ]
          Jason Lowe made changes -
          Assignee Jason Lowe [ jlowe ]
          Jason Lowe made changes -
          Attachment MAPREDUCE-3738.patch [ 12515804 ]
          Jason Lowe made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Target Version/s 0.24.0, 0.23.2 [ 12317654, 12319851 ]
          Siddharth Seth made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Release Note Committed to trunk and branch-0.23. Thanks Jason.
          Target Version/s 0.23.2, 0.24.0 [ 12319851, 12317654 ] 0.24.0, 0.23.2 [ 12317654, 12319851 ]
          Fix Version/s 0.23.2 [ 12319851 ]
          Resolution Fixed [ 1 ]
          Vinod Kumar Vavilapalli made changes -
          Link This issue is related to MAPREDUCE-3143 [ MAPREDUCE-3143 ]
          Allen Wittenauer made changes -
          Affects Version/s 2.0.0-alpha [ 12320354 ]
          Affects Version/s 0.24.0 [ 12317654 ]

            People

            • Assignee:
              Jason Lowe
              Reporter:
              Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development