Hadoop YARN
  1. Hadoop YARN
  2. YARN-325

RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing

    Details

      Description

      If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock.

      Stacktrace to follow.

      1. YARN-325-branch23.patch
        8 kB
        Thomas Graves
      2. YARN-325.patch
        7 kB
        Arun C Murthy
      3. YARN-325.patch
        9 kB
        Arun C Murthy

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1281 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1281/)
        YARN-325. RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431070)

        Result = FAILURE
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431070
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1281 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1281/ ) YARN-325 . RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431070) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431070 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1309 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1309/)
        YARN-325. RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431070)

        Result = FAILURE
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431070
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1309 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1309/ ) YARN-325 . RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431070) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431070 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #490 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/490/)
        YARN-325. RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431071)

        Result = FAILURE
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431071
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
        • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #490 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/490/ ) YARN-325 . RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431071) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431071 Files : /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #92 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/92/)
        YARN-325. RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431070)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431070
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #92 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/92/ ) YARN-325 . RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431070) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431070 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3205 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3205/)
        YARN-325. RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431070)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431070
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3205 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3205/ ) YARN-325 . RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing (Arun C Murthy via tgraves) (Revision 1431070) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431070 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Hide
        Thomas Graves added a comment -

        +1. Gridmix runs look good. I'll commit this shortly.

        Show
        Thomas Graves added a comment - +1. Gridmix runs look good. I'll commit this shortly.
        Hide
        Arun C Murthy added a comment -

        Thanks Thomas Graves!

        Show
        Arun C Murthy added a comment - Thanks Thomas Graves !
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12563946/YARN-325-branch23.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/329//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563946/YARN-325-branch23.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/329//console This message is automatically generated.
        Hide
        Thomas Graves added a comment -

        Thanks Arun! This is what we were thinking too. The code looks good. I'm going to run through gridmix to excercise some of the reservation cases a bit and if that looks good will commit.

        I've merged this to branch-0.23 also and will attach that patch for reference.

        Show
        Thomas Graves added a comment - Thanks Arun! This is what we were thinking too. The code looks good. I'm going to run through gridmix to excercise some of the reservation cases a bit and if that looks good will commit. I've merged this to branch-0.23 also and will attach that patch for reference.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12563898/YARN-325.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/328//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/328//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563898/YARN-325.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/328//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/328//console This message is automatically generated.
        Hide
        Arun C Murthy added a comment -

        Added unit-tests.

        Show
        Arun C Murthy added a comment - Added unit-tests.
        Hide
        Arun C Murthy added a comment -

        Illustrative patch, need to fix unit-tests yet.

        Show
        Arun C Murthy added a comment - Illustrative patch, need to fix unit-tests yet.
        Hide
        Arun C Murthy added a comment -

        Ok, the fix is to bubble the 'non-requirement' of the reservation all the way to the CapacityScheduler.nodeUpdate call and then call LeafQueue.completedContainer outside the context of LeafQueue.assignContainers i.e. do not call LeafQueue.completedContainer while holding the lock on the LeafQueue.

        LeafQueue.completedContainer, on it's own, has the right synchronization i.e. doesn't call ParentQueue.completedContainer while holding a lock on the LeafQueue.

        Show
        Arun C Murthy added a comment - Ok, the fix is to bubble the 'non-requirement' of the reservation all the way to the CapacityScheduler.nodeUpdate call and then call LeafQueue.completedContainer outside the context of LeafQueue.assignContainers i.e. do not call LeafQueue.completedContainer while holding the lock on the LeafQueue. LeafQueue.completedContainer, on it's own, has the right synchronization i.e. doesn't call ParentQueue.completedContainer while holding a lock on the LeafQueue.
        Hide
        Arun C Murthy added a comment -

        Jason Lowe This seems limited to a corner case (not that it should be ignore smile) in LeafQueue.assignedReservedContainer.

        The issue is that LeafQueue.assignReserved is a synchronized method which calls completedContainer... need to figure a way around this.

        Show
        Arun C Murthy added a comment - Jason Lowe This seems limited to a corner case (not that it should be ignore smile ) in LeafQueue.assignedReservedContainer. The issue is that LeafQueue.assignReserved is a synchronized method which calls completedContainer... need to figure a way around this.
        Hide
        Jason Lowe added a comment -

        Stacktrace of an occurrence:

        "IPC Server handler 28 on xxxx":
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueueInfo(LeafQueue.java:513)
                - waiting to lock <0x00002aaaee2e1600> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:314)
                - locked <0x00002aaaee2a7548> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueInfo(CapacityScheduler.java:527)
                at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:382)
                at org.apache.hadoop.yarn.api.impl.pb.service.ClientRMProtocolPBServiceImpl.getQueueInfo(ClientRMProtocolPBServiceImpl.java:181)
                at org.apache.hadoop.yarn.proto.ClientRMProtocol$ClientRMProtocolService$2.callBlockingMethod(ClientRMProtocol.java:188)
                at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1530)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1526)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:396)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
                at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1524)
        "ResourceManager Event Processor":
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.completedContainer(ParentQueue.java:685)
                - waiting to lock <0x00002aaaee2a7548> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1359)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:860)
                - locked <0x00002aaaee2e1600> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:763)
                - locked <0x00002aaaee2e1600> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:586)
                - locked <0x00002aaaee28b090> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:635)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:341)
                at java.lang.Thread.run(Thread.java:619)
        
        Found 1 deadlock.
        
        Show
        Jason Lowe added a comment - Stacktrace of an occurrence: "IPC Server handler 28 on xxxx": at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueueInfo(LeafQueue.java:513) - waiting to lock <0x00002aaaee2e1600> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:314) - locked <0x00002aaaee2a7548> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueInfo(CapacityScheduler.java:527) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:382) at org.apache.hadoop.yarn.api.impl.pb.service.ClientRMProtocolPBServiceImpl.getQueueInfo(ClientRMProtocolPBServiceImpl.java:181) at org.apache.hadoop.yarn.proto.ClientRMProtocol$ClientRMProtocolService$2.callBlockingMethod(ClientRMProtocol.java:188) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1530) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1526) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1524) "ResourceManager Event Processor": at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.completedContainer(ParentQueue.java:685) - waiting to lock <0x00002aaaee2a7548> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1359) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:860) - locked <0x00002aaaee2e1600> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:763) - locked <0x00002aaaee2e1600> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:586) - locked <0x00002aaaee28b090> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:635) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:341) at java.lang.Thread.run(Thread.java:619) Found 1 deadlock.

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development