Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3117

Deadlock in Edge and Vertex code

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.1, 0.8.3
    • None
    • None

    Description

      Java-level deadlocks detected
       
      This means that some threads are blocked waiting to enter a synchronization block or
      waiting to reenter a synchronization block after an Object.wait() call, where each thread
      owns one monitor while trying to obtain another monitor already held by another thread.
       
      Deadlock:
      
      
      App Shared Pool - #1 is waiting to lock java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which is held by Dispatcher thread {Central}
      Dispatcher thread {Central} is waiting to lock org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared Pool - #1
      
      
       
      Deadlock:
      
      
      Dispatcher thread {Central} is waiting to lock org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared Pool - #1
      App Shared Pool - #1 is waiting to lock java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which is held by Dispatcher thread {Central}
      
      
      
      
      Thread stacks
      
      
      App Shared Pool - #1 [WAITING]
       sun.misc.Unsafe.park(native method)
       java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
       java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
       java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
       java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
       java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
       org.apache.tez.dag.app.dag.impl.VertexImpl.getTotalTasks(VertexImpl.java:1098)
       org.apache.tez.dag.app.dag.impl.Edge$EdgeManagerPluginContextImpl.getDestinationVertexNumTasks(Edge.java:99)
       org.apache.tez.dag.app.dag.impl.Edge.routingToBegin(Edge.java:214)
       org.apache.tez.dag.app.dag.impl.VertexImpl.setupEdgeRouting(VertexImpl.java:1447)
       org.apache.tez.dag.app.dag.impl.VertexImpl.unsetTasksNotYetScheduled(VertexImpl.java:1453)
       org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1496)
       org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleTasks(VertexManager.java:216)
       org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.handleSourceTaskFinished(InputReadyVertexManager.java:275)
       org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onSourceTaskCompleted(InputReadyVertexManager.java:196)
       org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.trySchedulingPendingCompletions(InputReadyVertexManager.java:146)
       org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:187)
       org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:578)
       org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
       org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
       java.security.AccessController.doPrivileged(native method)
       javax.security.auth.Subject.doAs(Subject.java:422)
       org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
       org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
       org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
       java.util.concurrent.FutureTask.run(FutureTask.java:266)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       java.lang.Thread.<null>(unknown source)
      
      
      Dispatcher thread {Central} [BLOCKED; waiting to lock org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db]
       org.apache.tez.dag.app.dag.impl.Edge.getEdgeProperty(Edge.java:241)
       org.apache.tez.dag.app.dag.impl.VertexImpl.logVertexConfigurationDoneEvent(VertexImpl.java:1886)
       org.apache.tez.dag.app.dag.impl.VertexImpl.maybeSendConfiguredEvent(VertexImpl.java:3020)
       org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:3055)
       org.apache.tez.dag.app.dag.impl.VertexImpl.access$4500(VertexImpl.java:204)
       org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3007)
       org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2996)
       org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
       org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
       org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
       org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
       org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
       org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1799)
       org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
       org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2214)
       org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2200)
       org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
       org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
       java.lang.Thread.<null>(unknown source)
      
      
      Frozen threads found (potential deadlock)
       
      It seems that the following threads have not changed their stack for more than 10 seconds.
      These threads are possibly (but not necessarily!) in a deadlock or hung.
       
      client DomainSocketWatcher <--- Frozen for at least 20m 33 sec
      org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int, DomainSocketWatcher$FdSet) DomainSocketWatcher.java (native)
      org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(int, DomainSocketWatcher$FdSet) DomainSocketWatcher.java:52
      org.apache.hadoop.net.unix.DomainSocketWatcher$2.run() DomainSocketWatcher.java:511
      java.lang.Thread.run() Thread.java:745
      
      
      
      
      

      Attachments

        1. TEZ-3117.1.patch
          4 kB
          Bikas Saha
        2. TEZ-3117.1.patch
          4 kB
          Bikas Saha

        Activity

          People

            bikassaha Bikas Saha
            yeshavora Yesha Vora
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: