Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3878

AsyncDispatcher can hang while stopping if it is configured for draining events on stop

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The sequence of events is as under :

      1. RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown.
      2. As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On serviceStop, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop).
      3. This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits.

      Initial exception while posting RM State store event to queue

      2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED
      2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted
      java.lang.InterruptedException
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
      	at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
      	at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
      

      JStack of AsyncDispatcher hanging on stop

      "AsyncDispatcher event handler" prio=10 tid=0x00007fb980222800 nid=0x4b1e waiting on condition [0x00007fb9654e9000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x0000000700b79250> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
              at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
              at java.lang.Thread.run(Thread.java:744)
      
      "main" prio=10 tid=0x00007fb98000a800 nid=0x49c3 in Object.wait() [0x00007fb989851000]
         java.lang.Thread.State: TIMED_WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	- waiting on <0x0000000700b79430> (a java.lang.Object)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
      	- locked <0x0000000700b79430> (a java.lang.Object)
      	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
      	- locked <0x0000000700b79420> (a java.lang.Object)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:515)
      	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
      	- locked <0x0000000700b79630> (a java.lang.Object)
      	at org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:599)
      

      We keep on getting below logs

      2015-06-27 20:08:35,926 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(140)) - AsyncDispatcher is draining to stop, igonring any new events.
      2015-06-27 20:08:36,926 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:37,927 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:38,927 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:39,928 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:40,929 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:41,929 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:42,930 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:43,930 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:44,931 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:45,931 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      2015-06-27 20:08:46,932 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
      

        Attachments

        1. YARN-3878-branch-2.6.01.patch
          5 kB
          Varun Saxena
        2. YARN-3878.09.patch
          5 kB
          Varun Saxena
        3. YARN-3878.09_reprorace.pat_h
          7 kB
          Anubhav Dhoot
        4. YARN-3878.08.patch
          7 kB
          Varun Saxena
        5. YARN-3878.07.patch
          7 kB
          Varun Saxena
        6. YARN-3878.06.patch
          7 kB
          Varun Saxena
        7. YARN-3878.05.patch
          7 kB
          Varun Saxena
        8. YARN-3878.04.patch
          6 kB
          Varun Saxena
        9. YARN-3878.03.patch
          6 kB
          Varun Saxena
        10. YARN-3878.02.patch
          5 kB
          Varun Saxena
        11. YARN-3878.01.patch
          2 kB
          Varun Saxena

          Issue Links

            Activity

              People

              • Assignee:
                varun_saxena Varun Saxena
                Reporter:
                varun_saxena Varun Saxena
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: