Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2359

Deadlock in DAGAppMaster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Invalid
    • None
    • None
    • None
    • None

    Description

      Found one Java-level deadlock:
      =============================
      "Timer-1":
        waiting for ownable synchronizer 0x00000007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
        which is held by "Dispatcher thread: Central"
      "Dispatcher thread: Central":
        waiting to lock monitor 0x00007fb829866d18 (object 0x00000007cd5ab958, a org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
        which is held by "DelayedContainerManager"
      "DelayedContainerManager":
        waiting for ownable synchronizer 0x00000007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
        which is held by "Dispatcher thread: Central"
      
      Java stack information for the threads listed above:
      ===================================================
      "Timer-1":
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000007cd0f8a30> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
      	at org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
      	- locked <0x00000007cd0f2ff0> (a org.apache.tez.dag.app.DAGAppMaster)
      	at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
      	at java.util.TimerThread.mainLoop(Timer.java:555)
      	at java.util.TimerThread.run(Timer.java:505)
      "Dispatcher thread: Central":
      	at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
      	- waiting to lock <0x00000007cd5ab958> (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
      	at org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
      	at org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
      	at org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
      	at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
      	at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      	- locked <0x00000007cd1d0208> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
      	at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
      	at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
      	at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
      	at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
      	at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
      	at java.lang.Thread.run(Thread.java:745)
      "DelayedContainerManager":
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000007cd0f8a30> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
      	at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531)
      	at org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getAMState(DAGAppMaster.java:1522)
      	at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignDelayedContainer(YarnTaskSchedulerService.java:585)
      	- locked <0x00000007cd5ab958> (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
      	at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$600(YarnTaskSchedulerService.java:82)
      	at org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run(YarnTaskSchedulerService.java:1877)
      	- locked <0x00000007cd5ab958> (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
      
      Found 1 deadlock.
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zjffdu Jeff Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: