Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.4
    • Fix Version/s: 2.0.3-alpha, 0.23.5
    • Component/s: resourcemanager
    • Labels:
      None

      Description

      we ran into a deadlock in the RM.

      =============================
      "1128743461@qtp-1252749669-5201":
      waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
      which is held by "AsyncDispatcher event handler"
      "AsyncDispatcher event handler":
      waiting to lock monitor 0x00002ab0bba3a370 (object 0x00002aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl),
      which is held by "IPC Server handler 36 on 8030"
      "IPC Server handler 36 on 8030":
      waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
      which is held by "AsyncDispatcher event handler"
      Java stack information for the threads listed above:
      ===================================================
      "1128743461@qtp-1252749669-5201":
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
        95)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
        at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
        ...
        ...
        ..

      "AsyncDispatcher event handler":
      at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307)

      • waiting to lock <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796)
        at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
      • locked <0x00002aabbb673090> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:417)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
        at java.lang.Thread.run(Thread.java:619)
        "IPC Server handler 36 on 8030":
        at sun.misc.Unsafe.park(Native Method)
      • parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
        at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.pullJustFinishedContainers(RMAppAttemptImpl.java:437)
        at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:285)
      • locked <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
        at org.apache.hadoop.yarn.api.impl.pb.service.AMRMProtocolPBServiceImpl.allocate(AMRMProtocolPBServiceImpl.java:56)
        at org.apache.hadoop.yarn.proto.AMRMProtocol$AMRMProtocolService$2.callBlockingMethod(AMRMProtocol.java:87)
        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1528)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1524)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1522)
      1. YARN-189.patch
        3 kB
        Thomas Graves
      2. YARN-189.patch
        3 kB
        Thomas Graves

        Issue Links

          Activity

          Thomas Graves created issue -
          Thomas Graves made changes -
          Field Original Value New Value
          Attachment YARN-189.patch [ 12551348 ]
          Thomas Graves made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Vinod Kumar Vavilapalli made changes -
          Priority Critical [ 2 ] Blocker [ 1 ]
          Thomas Graves made changes -
          Attachment YARN-189.patch [ 12551540 ]
          Vinod Kumar Vavilapalli made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 2.0.3-alpha [ 12323272 ]
          Fix Version/s 0.23.5 [ 12323311 ]
          Resolution Fixed [ 1 ]
          Vinod Kumar Vavilapalli made changes -
          Link This issue relates to MAPREDUCE-4761 [ MAPREDUCE-4761 ]
          Thomas Graves made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Devaraj K made changes -
          Link This issue duplicates YARN-452 [ YARN-452 ]

            People

            • Assignee:
              Thomas Graves
              Reporter:
              Thomas Graves
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development