Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
      15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: http://XXX:8188/ws/v1/timeline/
      15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at XXX/0.0.0.0:8050
      15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History server at XXX/0.0.0.0:10200
      
      RM log
      2015-12-04 21:59:19,744 INFO  event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000
      2015-12-04 22:00:50,945 INFO  event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000
      2015-12-04 22:02:22,416 INFO  event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000
      2015-12-04 22:03:53,593 INFO  event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 240000
      2015-12-04 22:05:24,856 INFO  event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000
      2015-12-04 22:06:56,235 INFO  event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000
      2015-12-04 22:08:27,510 INFO  event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000
      2015-12-04 22:09:58,786 INFO  event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000
      

        Activity

        Hide
        templedf Daniel Templeton added a comment -

        Any other details you can share, such as how many applications are running in the system and what states they're in?

        Show
        templedf Daniel Templeton added a comment - Any other details you can share, such as how many applications are running in the system and what states they're in?
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Yesha Vora
        Similar issue was raised earlier, would be helpful if the information related to version in which it was reproduced and the scenario, like were all the nodes restarted ? and size of the cluster ?

        Show
        Naganarasimha Naganarasimha G R added a comment - Yesha Vora Similar issue was raised earlier, would be helpful if the information related to version in which it was reproduced and the scenario, like were all the nodes restarted ? and size of the cluster ?
        Hide
        jianhe Jian He added a comment -

        This is a similar problem to YARN-2594
        Thread 1

        Thread 53785: (state = BLOCKED)
         - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
         - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Interpreted frame)
         - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus() @bci=4, line=478 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptFinished(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttempt, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptState, org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMApp, long) @bci=45, line=162 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=288, line=1300 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=9, line=1493 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(java.lang.Object, java.lang.Object) @bci=9, line=1480 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalStateSavedTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=24, line=1213 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalStateSavedTransition.transition(java.lang.Object, java.lang.Object) @bci=9, line=1205 (Interpreted frame)
         - org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(java.lang.Object, java.lang.Enum, java.lang.Object, java.lang.Enum) @bci=6, line=385 (Interpreted frame)
         - org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(java.lang.Object, java.lang.Enum, java.lang.Enum, java.lang.Object) @bci=45, line=302 (Interpreted frame)
         - org.apache.hadoop.yarn.state.StateMachineFactory.access$300(org.apache.hadoop.yarn.state.StateMachineFactory, java.lang.Object, java.lang.Enum, java.lang.Enum, java.lang.Object) @bci=6, line=46 (Interpreted frame)
         - org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(java.lang.Enum, java.lang.Object) @bci=15, line=448 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=65, line=784 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(org.apache.hadoop.yarn.event.Event) @bci=5, line=106 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=53, line=815 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(org.apache.hadoop.yarn.event.Event) @bci=5, line=796 (Interpreted frame)
         - org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(org.apache.hadoop.yarn.event.Event) @bci=88, line=183 (Interpreted frame)
         - org.apache.hadoop.yarn.event.AsyncDispatcher$1.run() @bci=140, line=109 (Interpreted frame)
        

        Thread 2

        Thread 25723: (state = BLOCKED)
         - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
         - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Compiled frame)
         - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Compiled frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(java.lang.String, boolean) @bci=4, line=598 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(org.apache.hadoop.yarn.api.protocolrecords.GetApplicationsRequest, boolean) @bci=610, line=814 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(org.apache.hadoop.yarn.api.protocolrecords.GetApplicationsRequest) @bci=3, line=681 (Interpreted frame)
         - org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run() @bci=11, line=89 (Interpreted frame)
         - org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run() @bci=1, line=86 (Interpreted frame)
         - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame)
         - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=415 (Compiled frame)
         - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1657 (Compiled frame)
         - org.apache.hadoop.yarn.server.webapp.AppsBlock.fetchData() @bci=166, line=84 (Interpreted frame)
         - org.apache.hadoop.yarn.server.webapp.AppsBlock.render(org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block) @bci=7, line=101 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.view.HtmlBlock.render() @bci=31, line=69 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial() @bci=1, line=79 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.View.render(java.lang.Class) @bci=16, line=235 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(java.lang.Class) @bci=23, line=43 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(java.lang.Class) @bci=2, line=30347 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlockWithMetrics.render(org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block) @bci=12, line=30 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.view.HtmlBlock.render() @bci=31, line=69 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial() @bci=1, line=79 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.View.render(java.lang.Class) @bci=16, line=235 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(java.lang.Class) @bci=23, line=49 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(java.lang.Class) @bci=9, line=117 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(java.lang.Class) @bci=2, line=845 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(org.apache.hadoop.yarn.webapp.hamlet.Hamlet$HTML) @bci=211, line=56 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.view.HtmlPage.render() @bci=35, line=82 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.Dispatcher.render(java.lang.Class) @bci=13, line=197 (Interpreted frame)
         - org.apache.hadoop.yarn.webapp.Dispatcher.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=452, line=156 (Interpreted frame)
         - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=820 (Interpreted frame)
         - com.google.inject.servlet.ServletDefinition.doService(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=26, line=263 (Interpreted frame)
        

        Thread 3

        Thread 53696: (state = BLOCKED)
         - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
         - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=867 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1197 (Interpreted frame)
         - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=945 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.pullRMNodeUpdates(java.util.Collection) @bci=4, line=584 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest) @bci=825, line=551 (Interpreted frame)
         - org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(com.google.protobuf.RpcController, org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateRequestProto) @bci=14, line=60 (Interpreted frame)
         - org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=91, line=99 (Interpreted frame)
         - org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server, java.lang.String, org.apache.hadoop.io.Writable, long) @bci=246, line=616 (Interpreted frame)
         - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind, java.lang.String, org.apache.hadoop.io.Writable, long) @bci=9, line=969 (Interpreted frame)
         - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2151 (Interpreted frame)
         - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2147 (Interpreted frame)
         - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame)
         - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=415 (Interpreted frame)
         - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1657 (Interpreted frame)
         - org.apache.hadoop.ipc.Server$Handler.run() @bci=315, line=2145 (Interpreted frame)
        

        Thread 1: RMAppAttemptImpl ( holding RMAppAttemptImpl write lock) -> SystemMetricsPublisher#appAttemptFinished -> RMAppImpl (trying to acquire RMAppImpl read lock, which cannot be obtained because thread 3 had requested the write lock;
        Thread 2: RMAppImpl (holding RMAppImpl read lock) -> createAndGetApplicationReport -> RMAppAttemptImpl (trying to acquire RMAppAttemptImpl read lock)

        Show
        jianhe Jian He added a comment - This is a similar problem to YARN-2594 Thread 1 Thread 53785: (state = BLOCKED) - sun.misc.Unsafe.park( boolean , long ) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang. Object ) @bci=14, line=186 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared( int ) @bci=83, line=964 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared( int ) @bci=10, line=1282 (Interpreted frame) - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus() @bci=4, line=478 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptFinished(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttempt, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptState, org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMApp, long ) @bci=45, line=162 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=288, line=1300 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=9, line=1493 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(java.lang. Object , java.lang. Object ) @bci=9, line=1480 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalStateSavedTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=24, line=1213 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalStateSavedTransition.transition(java.lang. Object , java.lang. Object ) @bci=9, line=1205 (Interpreted frame) - org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(java.lang. Object , java.lang.Enum, java.lang. Object , java.lang.Enum) @bci=6, line=385 (Interpreted frame) - org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(java.lang. Object , java.lang.Enum, java.lang.Enum, java.lang. Object ) @bci=45, line=302 (Interpreted frame) - org.apache.hadoop.yarn.state.StateMachineFactory.access$300(org.apache.hadoop.yarn.state.StateMachineFactory, java.lang. Object , java.lang.Enum, java.lang.Enum, java.lang. Object ) @bci=6, line=46 (Interpreted frame) - org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(java.lang.Enum, java.lang. Object ) @bci=15, line=448 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=65, line=784 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(org.apache.hadoop.yarn.event.Event) @bci=5, line=106 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=53, line=815 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(org.apache.hadoop.yarn.event.Event) @bci=5, line=796 (Interpreted frame) - org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(org.apache.hadoop.yarn.event.Event) @bci=88, line=183 (Interpreted frame) - org.apache.hadoop.yarn.event.AsyncDispatcher$1.run() @bci=140, line=109 (Interpreted frame) Thread 2 Thread 25723: (state = BLOCKED) - sun.misc.Unsafe.park( boolean , long ) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang. Object ) @bci=14, line=186 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared( int ) @bci=83, line=964 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared( int ) @bci=10, line=1282 (Compiled frame) - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Compiled frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(java.lang. String , boolean ) @bci=4, line=598 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(org.apache.hadoop.yarn.api.protocolrecords.GetApplicationsRequest, boolean ) @bci=610, line=814 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(org.apache.hadoop.yarn.api.protocolrecords.GetApplicationsRequest) @bci=3, line=681 (Interpreted frame) - org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run() @bci=11, line=89 (Interpreted frame) - org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run() @bci=1, line=86 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=415 (Compiled frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1657 (Compiled frame) - org.apache.hadoop.yarn.server.webapp.AppsBlock.fetchData() @bci=166, line=84 (Interpreted frame) - org.apache.hadoop.yarn.server.webapp.AppsBlock.render(org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block) @bci=7, line=101 (Interpreted frame) - org.apache.hadoop.yarn.webapp.view.HtmlBlock.render() @bci=31, line=69 (Interpreted frame) - org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial() @bci=1, line=79 (Interpreted frame) - org.apache.hadoop.yarn.webapp.View.render(java.lang. Class ) @bci=16, line=235 (Interpreted frame) - org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(java.lang. Class ) @bci=23, line=43 (Interpreted frame) - org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(java.lang. Class ) @bci=2, line=30347 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlockWithMetrics.render(org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block) @bci=12, line=30 (Interpreted frame) - org.apache.hadoop.yarn.webapp.view.HtmlBlock.render() @bci=31, line=69 (Interpreted frame) - org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial() @bci=1, line=79 (Interpreted frame) - org.apache.hadoop.yarn.webapp.View.render(java.lang. Class ) @bci=16, line=235 (Interpreted frame) - org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(java.lang. Class ) @bci=23, line=49 (Interpreted frame) - org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(java.lang. Class ) @bci=9, line=117 (Interpreted frame) - org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(java.lang. Class ) @bci=2, line=845 (Interpreted frame) - org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(org.apache.hadoop.yarn.webapp.hamlet.Hamlet$HTML) @bci=211, line=56 (Interpreted frame) - org.apache.hadoop.yarn.webapp.view.HtmlPage.render() @bci=35, line=82 (Interpreted frame) - org.apache.hadoop.yarn.webapp.Dispatcher.render(java.lang. Class ) @bci=13, line=197 (Interpreted frame) - org.apache.hadoop.yarn.webapp.Dispatcher.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=452, line=156 (Interpreted frame) - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=820 (Interpreted frame) - com.google.inject.servlet.ServletDefinition.doService(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=26, line=263 (Interpreted frame) Thread 3 Thread 53696: (state = BLOCKED) - sun.misc.Unsafe.park( boolean , long ) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang. Object ) @bci=14, line=186 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int ) @bci=67, line=867 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire( int ) @bci=17, line=1197 (Interpreted frame) - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=945 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.pullRMNodeUpdates(java.util.Collection) @bci=4, line=584 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest) @bci=825, line=551 (Interpreted frame) - org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(com.google.protobuf.RpcController, org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateRequestProto) @bci=14, line=60 (Interpreted frame) - org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=91, line=99 (Interpreted frame) - org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server, java.lang. String , org.apache.hadoop.io.Writable, long ) @bci=246, line=616 (Interpreted frame) - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind, java.lang. String , org.apache.hadoop.io.Writable, long ) @bci=9, line=969 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2151 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2147 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=415 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1657 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler.run() @bci=315, line=2145 (Interpreted frame) Thread 1: RMAppAttemptImpl ( holding RMAppAttemptImpl write lock) -> SystemMetricsPublisher#appAttemptFinished -> RMAppImpl (trying to acquire RMAppImpl read lock, which cannot be obtained because thread 3 had requested the write lock; Thread 2: RMAppImpl (holding RMAppImpl read lock) -> createAndGetApplicationReport -> RMAppAttemptImpl (trying to acquire RMAppAttemptImpl read lock)
        Hide
        jianhe Jian He added a comment -

        The patch removes the read lock in RMAppImpl#getFinalApplicationStatus as I think that's not required.

        Show
        jianhe Jian He added a comment - The patch removes the read lock in RMAppImpl#getFinalApplicationStatus as I think that's not required.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 8m 5s trunk passed
        +1 compile 0m 29s trunk passed with JDK v1.8.0_66
        +1 compile 0m 31s trunk passed with JDK v1.7.0_85
        +1 checkstyle 0m 14s trunk passed
        +1 mvnsite 0m 38s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 1m 16s trunk passed
        -1 javadoc 0m 23s hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66.
        +1 javadoc 0m 28s trunk passed with JDK v1.7.0_85
        +1 mvninstall 0m 36s the patch passed
        +1 compile 0m 28s the patch passed with JDK v1.8.0_66
        +1 javac 0m 28s the patch passed
        +1 compile 0m 31s the patch passed with JDK v1.7.0_85
        +1 javac 0m 31s the patch passed
        +1 checkstyle 0m 13s the patch passed
        +1 mvnsite 0m 37s the patch passed
        +1 mvneclipse 0m 16s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 22s the patch passed
        -1 javadoc 0m 23s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66.
        +1 javadoc 0m 28s the patch passed with JDK v1.7.0_85
        -1 unit 64m 48s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66.
        -1 unit 65m 35s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85.
        +1 asflicense 0m 27s Patch does not generate ASF License warnings.
        149m 13s



        Reason Tests
        JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
          hadoop.yarn.server.resourcemanager.TestAMAuthorization
        JDK v1.7.0_85 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
          hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
          hadoop.yarn.server.resourcemanager.TestAMAuthorization



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12776008/YARN-4424.1.patch
        JIRA Issue YARN-4424
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 8e1b6abddd1b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 65f3952
        findbugs v3.0.0
        javadoc https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
        javadoc https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt
        unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt
        JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9880/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Max memory used 76MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/9880/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 5s trunk passed +1 compile 0m 29s trunk passed with JDK v1.8.0_66 +1 compile 0m 31s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 14s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 16s trunk passed -1 javadoc 0m 23s hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. +1 javadoc 0m 28s trunk passed with JDK v1.7.0_85 +1 mvninstall 0m 36s the patch passed +1 compile 0m 28s the patch passed with JDK v1.8.0_66 +1 javac 0m 28s the patch passed +1 compile 0m 31s the patch passed with JDK v1.7.0_85 +1 javac 0m 31s the patch passed +1 checkstyle 0m 13s the patch passed +1 mvnsite 0m 37s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 22s the patch passed -1 javadoc 0m 23s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. +1 javadoc 0m 28s the patch passed with JDK v1.7.0_85 -1 unit 64m 48s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. -1 unit 65m 35s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. +1 asflicense 0m 27s Patch does not generate ASF License warnings. 149m 13s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_85 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12776008/YARN-4424.1.patch JIRA Issue YARN-4424 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8e1b6abddd1b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 65f3952 findbugs v3.0.0 javadoc https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt javadoc https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/9880/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9880/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/9880/console This message was automatically generated.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        very interesting!!
        From the thread dumps, I see that all 3 threads are BLOCKED for RMAppImpl, but I don't get the thread holding the lock of RMAppImpl. Is there any other thread flow which is holding lock of RMAppImpl?

        Show
        rohithsharma Rohith Sharma K S added a comment - very interesting!! From the thread dumps, I see that all 3 threads are BLOCKED for RMAppImpl, but I don't get the thread holding the lock of RMAppImpl. Is there any other thread flow which is holding lock of RMAppImpl?
        Hide
        leftnoteasy Wangda Tan added a comment -

        I also think remove readlock of getFinalApplicationStatus should resolve this problem.

        Rohith Sharma K S, this is a 3 threads participated locking issue, see Jian He's analysis:

        Thread 1: RMAppAttemptImpl ( holding RMAppAttemptImpl write lock) -> SystemMetricsPublisher#appAttemptFinished -> RMAppImpl (trying to acquire RMAppImpl read lock, which cannot be obtained because thread 3 had requested the write lock;
        Thread 2: RMAppImpl (holding RMAppImpl read lock) -> createAndGetApplicationReport -> RMAppAttemptImpl (trying to acquire RMAppAttemptImpl read lock)
        
        Show
        leftnoteasy Wangda Tan added a comment - I also think remove readlock of getFinalApplicationStatus should resolve this problem. Rohith Sharma K S , this is a 3 threads participated locking issue, see Jian He 's analysis: Thread 1: RMAppAttemptImpl ( holding RMAppAttemptImpl write lock) -> SystemMetricsPublisher#appAttemptFinished -> RMAppImpl (trying to acquire RMAppImpl read lock, which cannot be obtained because thread 3 had requested the write lock; Thread 2: RMAppImpl (holding RMAppImpl read lock) -> createAndGetApplicationReport -> RMAppAttemptImpl (trying to acquire RMAppAttemptImpl read lock)
        Hide
        leftnoteasy Wangda Tan added a comment -

        +1 to latest patch, will commit tomorrow if no opposite opinions.

        Jian He could you confirm if test failure and java docs warning is related?

        Show
        leftnoteasy Wangda Tan added a comment - +1 to latest patch, will commit tomorrow if no opposite opinions. Jian He could you confirm if test failure and java docs warning is related?
        Hide
        jianhe Jian He added a comment -

        not related

        Show
        jianhe Jian He added a comment - not related
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Sorry if I am not clear in previous comment. Correct me if my understanding is wrong.
        From the 3 Threads trace, I see that all 3 threads are BLOCKED for holding RMAppImpl lock. Thread-1 and Thread-2 is for ReadLock & Thread-3 is for WriteLock.

        Thread-1

        Thread 53785: (state = BLOCKED)

        • sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
        • java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame)
        • java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame)
        • java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame)
        • java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Interpreted frame)
        • java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Interpreted frame)
        • org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus() @bci=4, line=478 (Interpreted frame)
        • org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptFinished

        Thread-2

        Thread 25723: (state = BLOCKED)

        • sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
        • java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame)
        • java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame)
        • java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame)
        • java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Compiled frame)
        • java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Compiled frame)
        • org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(java.lang.String, boolean) @bci=4, line=598 (Interpreted frame)

        Thread-3

        Thread 53696: (state = BLOCKED)

        • sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
        • java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame)
        • java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame)
        • java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=867 (Interpreted frame)
        • java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1197 (Interpreted frame)
        • java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=945 (Interpreted frame)
        • org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.pullRMNodeUpdates(java.util.Collection) @bci=4, line=584 (Interpreted frame)

        Thread-2 is NOT holding read lock of RMAppImpl, but it is blocked for read lock. So my doubt is which thread flow is holding lock of RMAppImpl?

        Show
        rohithsharma Rohith Sharma K S added a comment - Sorry if I am not clear in previous comment. Correct me if my understanding is wrong. From the 3 Threads trace, I see that all 3 threads are BLOCKED for holding RMAppImpl lock. Thread-1 and Thread-2 is for ReadLock & Thread-3 is for WriteLock . Thread-1 Thread 53785: (state = BLOCKED) sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Interpreted frame) java.util.concurrent.locks. ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Interpreted frame) org.apache.hadoop.yarn.server.resourcemanager.rmapp. RMAppImpl.getFinalApplicationStatus() @bci=4, line=478 (Interpreted frame) org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptFinished Thread-2 Thread 25723: (state = BLOCKED) sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Compiled frame) java.util.concurrent.locks. ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Compiled frame) org.apache.hadoop.yarn.server.resourcemanager.rmapp. RMAppImpl.createAndGetApplicationReport (java.lang.String, boolean) @bci=4, line=598 (Interpreted frame) Thread-3 Thread 53696: (state = BLOCKED) sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=867 (Interpreted frame) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1197 (Interpreted frame) java.util.concurrent.locks. ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=945 (Interpreted frame) org.apache.hadoop.yarn.server.resourcemanager.rmapp. RMAppImpl.pullRMNodeUpdates (java.util.Collection) @bci=4, line=584 (Interpreted frame) Thread-2 is NOT holding read lock of RMAppImpl, but it is blocked for read lock. So my doubt is which thread flow is holding lock of RMAppImpl?
        Hide
        jianhe Jian He added a comment -

        Zhihai's comment is very useful to understand this part. In short word, Thread-2 blocked for readlock, because Thread-3 had requested the write lock first.

        Show
        jianhe Jian He added a comment - Zhihai's comment is very useful to understand this part. In short word, Thread-2 blocked for readlock, because Thread-3 had requested the write lock first.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Essentially I understood the potential issue which can cause deadlock which patch is trying to solve.

        But from the jstack, Why Thread-3 is not getting write lock? Which thread is holding read lock or is there any other thread already having write lock and not proceeding?

        Show
        rohithsharma Rohith Sharma K S added a comment - Essentially I understood the potential issue which can cause deadlock which patch is trying to solve. But from the jstack, Why Thread-3 is not getting write lock? Which thread is holding read lock or is there any other thread already having write lock and not proceeding?
        Hide
        jianhe Jian He added a comment - - edited

        sorry, I pasted a wrong thread stack, below thread is holding the RMAppImp's read lock and trying to access RMAppAttemptImpl. Thanks for checking so carefully !

        Thread 53732: (state = BLOCKED)
         - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
         - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame)
         - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Interpreted frame)
         - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getTrackingUrl() @bci=4, line=511 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(java.lang.String, boolean) @bci=82, line=618 (Interpreted frame)
         - org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(org.apache.hadoop.yarn.api.protocolrecords.GetApplicationReportRequest) @bci=116, line=334 (Interpreted frame)
         - org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(com.google.protobuf.RpcController, org.apache.hadoop.yarn.proto.YarnServiceProtos$GetApplicationReportRequestProto) @bci=14, line=175 (Interpreted frame)
         - org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=156, line=417 (Interpreted frame)
         - org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server, java.lang.String, org.apache.hadoop.io.Writable, long) @bci=246, line=616 (Interpreted frame)
         - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind, java.lang.String, org.apache.hadoop.io.Writable, long) @bci=9, line=969 (Interpreted frame)
         - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2151 (Interpreted frame)
         - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2147 (Interpreted frame)
         - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame)
         - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=415 (Interpreted frame)
         - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1657 (Interpreted frame)
         - org.apache.hadoop.ipc.Server$Handler.run() @bci=315, line=2145 (Interpreted frame)
        
        Show
        jianhe Jian He added a comment - - edited sorry, I pasted a wrong thread stack, below thread is holding the RMAppImp's read lock and trying to access RMAppAttemptImpl. Thanks for checking so carefully ! Thread 53732: (state = BLOCKED) - sun.misc.Unsafe.park( boolean , long ) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang. Object ) @bci=14, line=186 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared( int ) @bci=83, line=964 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared( int ) @bci=10, line=1282 (Interpreted frame) - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getTrackingUrl() @bci=4, line=511 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(java.lang. String , boolean ) @bci=82, line=618 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(org.apache.hadoop.yarn.api.protocolrecords.GetApplicationReportRequest) @bci=116, line=334 (Interpreted frame) - org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(com.google.protobuf.RpcController, org.apache.hadoop.yarn.proto.YarnServiceProtos$GetApplicationReportRequestProto) @bci=14, line=175 (Interpreted frame) - org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=156, line=417 (Interpreted frame) - org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server, java.lang. String , org.apache.hadoop.io.Writable, long ) @bci=246, line=616 (Interpreted frame) - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind, java.lang. String , org.apache.hadoop.io.Writable, long ) @bci=9, line=969 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2151 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2147 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=415 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1657 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler.run() @bci=315, line=2145 (Interpreted frame)
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Thanks Jian He for clarification
        I am +1 for current patch

        Show
        rohithsharma Rohith Sharma K S added a comment - Thanks Jian He for clarification I am +1 for current patch
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks Jian He, committed to trunk/branch-2/branch-2.6/branch-2.7.

        Show
        leftnoteasy Wangda Tan added a comment - Thanks Jian He , committed to trunk/branch-2/branch-2.6/branch-2.7.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8943 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8943/)
        YARN-4424. Fix deadlock in RMAppImpl. (Jian he via wangda) (wangda: rev 7e4715186d31ac889fba26d453feedcebb11fc70)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8943 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8943/ ) YARN-4424 . Fix deadlock in RMAppImpl. (Jian he via wangda) (wangda: rev 7e4715186d31ac889fba26d453feedcebb11fc70) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #677 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/677/)
        YARN-4424. Fix deadlock in RMAppImpl. (Jian he via wangda) (wangda: rev 7e4715186d31ac889fba26d453feedcebb11fc70)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #677 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/677/ ) YARN-4424 . Fix deadlock in RMAppImpl. (Jian he via wangda) (wangda: rev 7e4715186d31ac889fba26d453feedcebb11fc70) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/CHANGES.txt
        Hide
        djp Junping Du added a comment -

        Hi Wangda Tan and Jian He, Is this blocker for 2.6.3? If so, we should commit it to 2.6.3.

        Show
        djp Junping Du added a comment - Hi Wangda Tan and Jian He , Is this blocker for 2.6.3? If so, we should commit it to 2.6.3.
        Hide
        jianhe Jian He added a comment -

        yep, just merged to 2.6.3 too

        Show
        jianhe Jian He added a comment - yep, just merged to 2.6.3 too
        Hide
        leftnoteasy Wangda Tan added a comment -

        Committed to branch-2.8.

        Show
        leftnoteasy Wangda Tan added a comment - Committed to branch-2.8.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        This originally never made it to branch-2.7.2 even though the fix version is set so. Tx to Junping Du for catching this.

        I just cherry-picked it for rolling a new RC for 2.7.2. FYI.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - This originally never made it to branch-2.7.2 even though the fix version is set so. Tx to Junping Du for catching this. I just cherry-picked it for rolling a new RC for 2.7.2. FYI.

          People

          • Assignee:
            jianhe Jian He
            Reporter:
            yeshavora Yesha Vora
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development