Details

    • Type: Sub-task
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: fairscheduler
    • Labels:
      None

      Description

      Collections.sort() consumes too much time in a scheduling round.

      1. sampling1.jpg
        320 kB
        Xianyin Xin
      2. sampling2.jpg
        285 kB
        Xianyin Xin
      3. YARN-4090.001.patch
        10 kB
        Xianyin Xin
      4. YARN-4090.002.patch
        10 kB
        Xianyin Xin
      5. YARN-4090.003.patch
        10 kB
        Xianyin Xin
      6. YARN-4090.004.patch
        6 kB
        zhangshilong
      7. YARN-4090.005.patch
        7 kB
        zhangshilong
      8. YARN-4090.006.patch
        7 kB
        zhangshilong
      9. YARN-4090.007.patch
        13 kB
        Yufei Gu
      10. YARN-4090.008.patch
        11 kB
        Yufei Gu
      11. YARN-4090-preview.patch
        6 kB
        Xianyin Xin
      12. YARN-4090-TestResult.pdf
        783 kB
        Xianyin Xin

        Issue Links

          Activity

          Hide
          xinxianyin Xianyin Xin added a comment -

          I construct a queue hierarchy with 3 levels,
          root
          child1 child2 child3
          child1.child1~10, child2.child1~15, child3.child1~15
          the number of leaf queues is 40. A total of 1000 apps running randomly on the leaf queues. The sampling results show that about 2/3 of the cpu times of FSParentQueue.assignContainers() was spent on Collections.sort(). In Collections.sort(), about 40% was spent on SchedulerAppplicationAttempt.getCurrentConsumption() and about 36% was spent on Resources.substract(). The former time consuming is because FSParentQueue.getResourceUsage() will make recursion on it's children, while for the latter time consuming, the clone() in substract() takes much cpu time.

          Show
          xinxianyin Xianyin Xin added a comment - I construct a queue hierarchy with 3 levels, root child1 child2 child3 child1.child1~10, child2.child1~15, child3.child1~15 the number of leaf queues is 40. A total of 1000 apps running randomly on the leaf queues. The sampling results show that about 2/3 of the cpu times of FSParentQueue.assignContainers() was spent on Collections.sort(). In Collections.sort(), about 40% was spent on SchedulerAppplicationAttempt.getCurrentConsumption() and about 36% was spent on Resources.substract(). The former time consuming is because FSParentQueue.getResourceUsage() will make recursion on it's children, while for the latter time consuming, the clone() in substract() takes much cpu time.
          Hide
          xinxianyin Xianyin Xin added a comment -

          We should also pay attention to the ReadLock.lock() and unlock() in the first img which cost much time.

          Show
          xinxianyin Xianyin Xin added a comment - We should also pay attention to the ReadLock.lock() and unlock() in the first img which cost much time.
          Hide
          xinxianyin Xianyin Xin added a comment -

          A simple fix and the correspond test result are submitted. The results show how expensive of the method the original comparator.compare() uses, that is, a recursive method to collect resource usages of two queues, and together with a time consuming FSAppAttempt.getResourceUsage().

          Show
          xinxianyin Xianyin Xin added a comment - A simple fix and the correspond test result are submitted. The results show how expensive of the method the original comparator.compare() uses, that is, a recursive method to collect resource usages of two queues, and together with a time consuming FSAppAttempt.getResourceUsage().
          Hide
          xinxianyin Xianyin Xin added a comment -

          Hi Wangda Tan, Karthik Kambatla, would you please take a look? since this change has relation with preemption, so link it with YARN-4120.

          Show
          xinxianyin Xianyin Xin added a comment - Hi Wangda Tan , Karthik Kambatla , would you please take a look? since this change has relation with preemption, so link it with YARN-4120 .
          Hide
          kasha Karthik Kambatla added a comment -

          Is this a duplicate of YARN-1297?

          Show
          kasha Karthik Kambatla added a comment - Is this a duplicate of YARN-1297 ?
          Hide
          xinxianyin Xianyin Xin added a comment -

          Sorry Karthik Kambatla, i didn't notice that, and thanks for reminding. Yes, this jira can be seen as part of YARN-1297, so close it as duplicated.

          Show
          xinxianyin Xianyin Xin added a comment - Sorry Karthik Kambatla , i didn't notice that, and thanks for reminding. Yes, this jira can be seen as part of YARN-1297 , so close it as duplicated.
          Hide
          xinxianyin Xianyin Xin added a comment -

          Let's move to YARN-1297 to continue the discussion.

          Show
          xinxianyin Xianyin Xin added a comment - Let's move to YARN-1297 to continue the discussion.
          Hide
          kasha Karthik Kambatla added a comment -

          Sorry for the confusion here. We had some issues with YARN-1297 and have moved this part of the change back here. Yufei Gu ran into some issues with existing tests. Xianyin Xin - are you able to post a patch that passes the tests?

          Show
          kasha Karthik Kambatla added a comment - Sorry for the confusion here. We had some issues with YARN-1297 and have moved this part of the change back here. Yufei Gu ran into some issues with existing tests. Xianyin Xin - are you able to post a patch that passes the tests?
          Hide
          xinxianyin Xianyin Xin added a comment -

          Thanks, Karthik Kambatla, will upload a new patch as soon as possible.

          Show
          xinxianyin Xianyin Xin added a comment - Thanks, Karthik Kambatla , will upload a new patch as soon as possible.
          Hide
          yufeigu Yufei Gu added a comment -

          FYI. These are three failed tests I ran into:
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption#testPreemptionDecisionWithNonPreemptableQueue
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability#testMoveRunnableApp
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption#testChoiceOfPreemptedContainers

          Show
          yufeigu Yufei Gu added a comment - FYI. These are three failed tests I ran into: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption#testPreemptionDecisionWithNonPreemptableQueue org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability#testMoveRunnableApp org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption#testChoiceOfPreemptedContainers
          Hide
          xinxianyin Xianyin Xin added a comment -

          Sorry for the delay, Karthik Kambatla Yufei Gu. Just uploaded the patch which fixed the three test fails above. For the two fails in TestFairSchedulerPreemption, it is because the decResourceUsage was double called when processing preemption (in both addPreemption() and containerCompleted()), and for the fail in TestAppRunnability, it is because we missed updating the queue's resource usage when moving an app.
          Thanks Yufei Gu for you info.

          Show
          xinxianyin Xianyin Xin added a comment - Sorry for the delay, Karthik Kambatla Yufei Gu . Just uploaded the patch which fixed the three test fails above. For the two fails in TestFairSchedulerPreemption , it is because the decResourceUsage was double called when processing preemption (in both addPreemption() and containerCompleted() ), and for the fail in TestAppRunnability , it is because we missed updating the queue's resource usage when moving an app. Thanks Yufei Gu for you info.
          Hide
          xinxianyin Xianyin Xin added a comment -

          Submit patch to kick jenkins.

          Show
          xinxianyin Xianyin Xin added a comment - Submit patch to kick jenkins.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 6m 54s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 12s trunk passed
          +1 compile 0m 38s trunk passed with JDK v1.8.0_92
          +1 compile 0m 30s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 19s trunk passed
          +1 mvnsite 0m 36s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 4s trunk passed
          +1 javadoc 0m 20s trunk passed with JDK v1.8.0_92
          +1 javadoc 0m 26s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 23s the patch passed with JDK v1.8.0_92
          +1 javac 0m 23s the patch passed
          +1 compile 0m 26s the patch passed with JDK v1.7.0_95
          +1 javac 0m 26s the patch passed
          +1 checkstyle 0m 16s the patch passed
          +1 mvnsite 0m 33s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 14s the patch passed
          +1 javadoc 0m 18s the patch passed with JDK v1.8.0_92
          +1 javadoc 0m 22s the patch passed with JDK v1.7.0_95
          -1 unit 43m 13s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_92.
          -1 unit 44m 46s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 19s Patch does not generate ASF License warnings.
          111m 49s



          Reason Tests
          JDK v1.8.0_92 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
            hadoop.yarn.server.resourcemanager.TestClientRMTokens
          JDK v1.8.0_92 Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
          JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
            hadoop.yarn.server.resourcemanager.TestClientRMTokens
          JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:7b1c37a
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12801177/YARN-4090.001.patch
          JIRA Issue YARN-4090
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e79848e812a8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 6f26b66
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_92 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/11257/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_92.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/11257/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11257/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_92.txt https://builds.apache.org/job/PreCommit-YARN-Build/11257/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11257/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/11257/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 6m 54s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 12s trunk passed +1 compile 0m 38s trunk passed with JDK v1.8.0_92 +1 compile 0m 30s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 19s trunk passed +1 mvnsite 0m 36s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 4s trunk passed +1 javadoc 0m 20s trunk passed with JDK v1.8.0_92 +1 javadoc 0m 26s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 30s the patch passed +1 compile 0m 23s the patch passed with JDK v1.8.0_92 +1 javac 0m 23s the patch passed +1 compile 0m 26s the patch passed with JDK v1.7.0_95 +1 javac 0m 26s the patch passed +1 checkstyle 0m 16s the patch passed +1 mvnsite 0m 33s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 14s the patch passed +1 javadoc 0m 18s the patch passed with JDK v1.8.0_92 +1 javadoc 0m 22s the patch passed with JDK v1.7.0_95 -1 unit 43m 13s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_92. -1 unit 44m 46s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 111m 49s Reason Tests JDK v1.8.0_92 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestContainerResourceUsage   hadoop.yarn.server.resourcemanager.TestClientRMTokens JDK v1.8.0_92 Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestContainerResourceUsage   hadoop.yarn.server.resourcemanager.TestClientRMTokens JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes Subsystem Report/Notes Docker Image:yetus/hadoop:7b1c37a JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12801177/YARN-4090.001.patch JIRA Issue YARN-4090 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e79848e812a8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6f26b66 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_92 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/11257/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_92.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/11257/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11257/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_92.txt https://builds.apache.org/job/PreCommit-YARN-Build/11257/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11257/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/11257/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          xinxianyin Xianyin Xin added a comment -

          It seems the failed UT with JDK1.8.0_92 and JDK1.7.0_95 is not triggered by the patch.

          Show
          xinxianyin Xianyin Xin added a comment - It seems the failed UT with JDK1.8.0_92 and JDK1.7.0_95 is not triggered by the patch.
          Hide
          yufeigu Yufei Gu added a comment -

          Thanks, Xianyin Xin! Looks really good. All three previous test failures are
          solved. The override of move() is a reasonable solution.

          Minor nit, do you mean "bring down" or "decrease" when you said "write
          down" in this comment?
          <quote>
          // do not decResource when the container exited in the preemptionMap
          // before because we have written down the resource when adding the
          // container to preemptionMap in this#addPreemption.
          <quote>
          Karthik Kambatla, wanna take a look?

          Show
          yufeigu Yufei Gu added a comment - Thanks, Xianyin Xin ! Looks really good. All three previous test failures are solved. The override of move() is a reasonable solution. Minor nit, do you mean "bring down" or "decrease" when you said "write down" in this comment? <quote> // do not decResource when the container exited in the preemptionMap // before because we have written down the resource when adding the // container to preemptionMap in this#addPreemption. <quote> Karthik Kambatla , wanna take a look?
          Hide
          yufeigu Yufei Gu added a comment -

          Another minor nit: there are two spaces between "synchronized" and "void" in public synchronized void incResourceUsage(Resource res).

          Show
          yufeigu Yufei Gu added a comment - Another minor nit: there are two spaces between "synchronized" and "void" in public synchronized void incResourceUsage(Resource res) .
          Hide
          xinxianyin Xianyin Xin added a comment -

          Sorry for the delay Yufei Gu.

          do you mean "bring down" or "decrease" when you said "write
          down" in this comment?

          Yes, i mean decrease, and changed the comments in the new patch.

          there are two spaces between "synchronized" and "void" in public synchronized void incResourceUsage(Resourc

          Fixed.

          Show
          xinxianyin Xianyin Xin added a comment - Sorry for the delay Yufei Gu . do you mean "bring down" or "decrease" when you said "write down" in this comment? Yes, i mean decrease, and changed the comments in the new patch. there are two spaces between "synchronized" and "void" in public synchronized void incResourceUsage(Resourc Fixed.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 6m 23s trunk passed
          +1 compile 0m 29s trunk passed
          +1 checkstyle 0m 21s trunk passed
          +1 mvnsite 0m 35s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 0m 54s trunk passed
          +1 javadoc 0m 22s trunk passed
          +1 mvninstall 0m 28s the patch passed
          +1 compile 0m 27s the patch passed
          +1 javac 0m 27s the patch passed
          +1 checkstyle 0m 17s the patch passed
          +1 mvnsite 0m 32s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 0s the patch passed
          +1 javadoc 0m 20s the patch passed
          -1 unit 34m 24s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 15s Patch does not generate ASF License warnings.
          47m 58s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:2c91fd8
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12805578/YARN-4090.002.patch
          JIRA Issue YARN-4090
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 8bc88ac2f971 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 6161d9b
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/11629/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11629/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11629/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/11629/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 23s trunk passed +1 compile 0m 29s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 35s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 0m 54s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 28s the patch passed +1 compile 0m 27s the patch passed +1 javac 0m 27s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 32s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 0s the patch passed +1 javadoc 0m 20s the patch passed -1 unit 34m 24s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 15s Patch does not generate ASF License warnings. 47m 58s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:2c91fd8 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12805578/YARN-4090.002.patch JIRA Issue YARN-4090 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8bc88ac2f971 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6161d9b Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/11629/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11629/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11629/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/11629/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          yufeigu Yufei Gu added a comment -

          Xianyin Xin, thanks for working on this. LGTM. Karthik Kambatla, wanna take a look?

          Show
          yufeigu Yufei Gu added a comment - Xianyin Xin , thanks for working on this. LGTM. Karthik Kambatla , wanna take a look?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 6m 42s trunk passed
          +1 compile 0m 33s trunk passed
          +1 checkstyle 0m 20s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 56s trunk passed
          +1 javadoc 0m 21s trunk passed
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 31s the patch passed
          +1 javac 0m 31s the patch passed
          +1 checkstyle 0m 17s the patch passed
          +1 mvnsite 0m 35s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 0s the patch passed
          +1 javadoc 0m 18s the patch passed
          +1 unit 37m 40s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          52m 5s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12805578/YARN-4090.002.patch
          JIRA Issue YARN-4090
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 55cb850923d2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9daa997
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12782/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/12782/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 42s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 56s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 31s the patch passed +1 javac 0m 31s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 0s the patch passed +1 javadoc 0m 18s the patch passed +1 unit 37m 40s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 52m 5s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12805578/YARN-4090.002.patch JIRA Issue YARN-4090 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 55cb850923d2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9daa997 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12782/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12782/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          He Tianyi He Tianyi added a comment -

          Hi there.
          I tried to backport this to 2.6.0 and seems deadlock occurs (or possibly non-fair sync).
          Containers only get assigned periodically.

          Any clue? Thanks.

          Show
          He Tianyi He Tianyi added a comment - Hi there. I tried to backport this to 2.6.0 and seems deadlock occurs (or possibly non-fair sync). Containers only get assigned periodically. Any clue? Thanks.
          Hide
          gsaha Gour Saha added a comment -

          Xianyin Xin the patch does not apply cleanly on trunk anymore. Can you please provide a new rebased patch?

          Show
          gsaha Gour Saha added a comment - Xianyin Xin the patch does not apply cleanly on trunk anymore. Can you please provide a new rebased patch?
          Hide
          xinxianyin Xianyin Xin added a comment -

          Sorry for the delay He Tianyi. and that i didn't examine the patch behavior in 2.6.0. Have you found the reason that cause the periodically assignment?

          Show
          xinxianyin Xianyin Xin added a comment - Sorry for the delay He Tianyi . and that i didn't examine the patch behavior in 2.6.0. Have you found the reason that cause the periodically assignment?
          Hide
          xinxianyin Xianyin Xin added a comment -

          Sorry for the delay Gour Saha. I attached the new patch based on the latest trunk.

          Show
          xinxianyin Xianyin Xin added a comment - Sorry for the delay Gour Saha . I attached the new patch based on the latest trunk.
          Hide
          piaoyu zhang zhangyubiao added a comment -

          Xianyin Xin,our team attached the patch to 2.7.1 and we found a deadlock occurs.

          Show
          piaoyu zhang zhangyubiao added a comment - Xianyin Xin ,our team attached the patch to 2.7.1 and we found a deadlock occurs.
          Hide
          piaoyu zhang zhangyubiao added a comment -

          Found one Java-level deadlock:
          =============================
          "IPC Server handler 98 on 8032":
          waiting to lock monitor 0x00007f4e48b1f808 (object 0x00007f42e17a5ed8, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
          which is held by "IPC Server handler 76 on 8032"
          "IPC Server handler 76 on 8032":
          waiting to lock monitor 0x00007f4e388b94f8 (object 0x00007f42df3e8450, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
          which is held by "ResourceManager Event Processor"
          "ResourceManager Event Processor":
          waiting to lock monitor 0x00007f4e48b1f808 (object 0x00007f42e17a5ed8, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
          which is held by "IPC Server handler 76 on 8032"

          Java stack information for the threads listed above:
          ===================================================
          "IPC Server handler 98 on 8032":
          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149)

          • waiting to lock <0x00007f42e17a5ed8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468)
            at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
            at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
            at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
            at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
            "IPC Server handler 76 on 8032":
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149)
          • waiting to lock <0x00007f42df3e8450> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:156)
          • locked <0x00007f42e17a5ed8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468)
            at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
            at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
            at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
            at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
            "ResourceManager Event Processor":
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:307)
          • waiting to lock <0x00007f42e17a5ed8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309)
          • locked <0x00007f42df3e8450> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309)
          • locked <0x00007f42e0c7cf50> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.containerCompleted(FSAppAttempt.java:157)
          • locked <0x00007f42deaf9aa8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:829)
          • eliminated <0x00007f42deaf8288> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:984)
          • locked <0x00007f42deaf8288> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1195)
            at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:121)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
            at java.lang.Thread.run(Thread.java:745)

          Found 1 deadlock.

          Show
          piaoyu zhang zhangyubiao added a comment - Found one Java-level deadlock: ============================= "IPC Server handler 98 on 8032": waiting to lock monitor 0x00007f4e48b1f808 (object 0x00007f42e17a5ed8, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue), which is held by "IPC Server handler 76 on 8032" "IPC Server handler 76 on 8032": waiting to lock monitor 0x00007f4e388b94f8 (object 0x00007f42df3e8450, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue), which is held by "ResourceManager Event Processor" "ResourceManager Event Processor": waiting to lock monitor 0x00007f4e48b1f808 (object 0x00007f42e17a5ed8, a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue), which is held by "IPC Server handler 76 on 8032" Java stack information for the threads listed above: =================================================== "IPC Server handler 98 on 8032": at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149) waiting to lock <0x00007f42e17a5ed8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) "IPC Server handler 76 on 8032": at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149) waiting to lock <0x00007f42df3e8450> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:156) locked <0x00007f42e17a5ed8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) "ResourceManager Event Processor": at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:307) waiting to lock <0x00007f42e17a5ed8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309) locked <0x00007f42df3e8450> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309) locked <0x00007f42e0c7cf50> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.containerCompleted(FSAppAttempt.java:157) locked <0x00007f42deaf9aa8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:829) eliminated <0x00007f42deaf8288> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:984) locked <0x00007f42deaf8288> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1195) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680) at java.lang.Thread.run(Thread.java:745) Found 1 deadlock.
          Hide
          zhengchenyu zhengchenyu added a comment - - edited

          here we see a dead block:
          "IPC Server handler 98 on 8032" is waiting for lock (0x00007f42e17a5ed8)
          "IPC Server handler 76 on 8032" got the lock (0x00007f42e17a5ed8), is is waiting for lock (0x00007f42df3e8450)
          "ResourceManager Event Processor" got the lock (0x00007f42df3e8450),is waiting for lock (0x00007f42e17a5ed8)

          In fact, 0x00007f42e17a5ed8 is a object lock of FSParentQueue, here I called this root.Parent.
          0x00007f42df3e8450 is another object lock of FSParentQueue, this is the child queue object of 0x00007f42e17a5ed8. here I called this root.Parent.Child.

          Let's trace these thread.
          (1) ResourceManager Event Processor

          FairScheduler.handle
            FairScheduler.nodeUpdate
              FairScheduler.completedContainer
                FSAppAttempt.containerCompleted
                  FSLeafQueue.decResourceUsage
                   //got the lock 0x00007f42e0c7cf50				
                    FSParentQueue.decResourceUsage				
          	   //got the lock 0x00007f42df3e8450 which is the object lock of root.Parent.Child
          	    FSParentQueue.decResourceUsage				
          	     //wait for 0x00007f42e17a5ed8 which is the object lock of root.Parent
          

          (2) IPC Server handler 76 on 8032

          ClientRMService.getQueueUserAcls
            FairScheduler.getQueueUserAclInfo
              FSParentQueue.getQueueUserAclInfo
               //got the lock 0x00007f42e17a5ed8
                FSParentQueue.getQueueUserAclInfo
                 //wait for the lock 0x00007f42df3e8450
          

          The left thread is unnecessary to analyse. Here we can see decResourceUsage got the object lock from bottom to top, but getQueueUserAcls got the object lock from top to bottom.
          getQueueUserAcls got the object lock of root and root.Parent, and waits for root.Parent.Child. But decResourceUsage got the object lock of root.Parent.Child, and waits for root.Parnt. That's a deadlock.
          I recommend that decResourceUsage is rewriten with the way of getting the object lock from top to bottom. Another way is that choose ReadWriteLock to take the place of object lock

          Show
          zhengchenyu zhengchenyu added a comment - - edited here we see a dead block: "IPC Server handler 98 on 8032" is waiting for lock (0x00007f42e17a5ed8) "IPC Server handler 76 on 8032" got the lock (0x00007f42e17a5ed8), is is waiting for lock (0x00007f42df3e8450) "ResourceManager Event Processor" got the lock (0x00007f42df3e8450),is waiting for lock (0x00007f42e17a5ed8) In fact, 0x00007f42e17a5ed8 is a object lock of FSParentQueue, here I called this root.Parent. 0x00007f42df3e8450 is another object lock of FSParentQueue, this is the child queue object of 0x00007f42e17a5ed8. here I called this root.Parent.Child. Let's trace these thread. (1) ResourceManager Event Processor FairScheduler.handle FairScheduler.nodeUpdate FairScheduler.completedContainer FSAppAttempt.containerCompleted FSLeafQueue.decResourceUsage //got the lock 0x00007f42e0c7cf50 FSParentQueue.decResourceUsage //got the lock 0x00007f42df3e8450 which is the object lock of root.Parent.Child FSParentQueue.decResourceUsage //wait for 0x00007f42e17a5ed8 which is the object lock of root.Parent (2) IPC Server handler 76 on 8032 ClientRMService.getQueueUserAcls FairScheduler.getQueueUserAclInfo FSParentQueue.getQueueUserAclInfo //got the lock 0x00007f42e17a5ed8 FSParentQueue.getQueueUserAclInfo //wait for the lock 0x00007f42df3e8450 The left thread is unnecessary to analyse. Here we can see decResourceUsage got the object lock from bottom to top, but getQueueUserAcls got the object lock from top to bottom. getQueueUserAcls got the object lock of root and root.Parent, and waits for root.Parent.Child. But decResourceUsage got the object lock of root.Parent.Child, and waits for root.Parnt. That's a deadlock. I recommend that decResourceUsage is rewriten with the way of getting the object lock from top to bottom. Another way is that choose ReadWriteLock to take the place of object lock
          Hide
          xinxianyin Xianyin Xin added a comment -

          Hi zhengchenyu, I've moved to another project so i don't have enough time to handle this problem. From your convincing analysis, i believe you must have had a patch right? would you mind to take it over?

          Show
          xinxianyin Xianyin Xin added a comment - Hi zhengchenyu , I've moved to another project so i don't have enough time to handle this problem. From your convincing analysis, i believe you must have had a patch right? would you mind to take it over?
          Hide
          He Tianyi He Tianyi added a comment -

          Hi, Xianyin Xin. The patch didn't cause deadlock. There has been another cause.

          BTW, I've been running a build with the patch for nearly 3 month on a cluster serving 50M container allocations each day, and everything works fine.

          Show
          He Tianyi He Tianyi added a comment - Hi, Xianyin Xin . The patch didn't cause deadlock. There has been another cause. BTW, I've been running a build with the patch for nearly 3 month on a cluster serving 50M container allocations each day, and everything works fine.
          Hide
          piaoyu zhang zhangyubiao added a comment -

          Hi, He Tianyi, Have you make the fairscheduler acl effective? as we test , it will not
          appear when the acl have not effective.

          Show
          piaoyu zhang zhangyubiao added a comment - Hi, He Tianyi , Have you make the fairscheduler acl effective? as we test , it will not appear when the acl have not effective.
          Hide
          He Tianyi He Tianyi added a comment - - edited

          acl is not enabled in my cluster.

          Show
          He Tianyi He Tianyi added a comment - - edited acl is not enabled in my cluster.
          Hide
          zsl2007 zhangshilong added a comment -

          Xianyin Xin Yufei Gu This optimization works in our environment very well, I hope to continue this issue.

          Show
          zsl2007 zhangshilong added a comment - Xianyin Xin Yufei Gu This optimization works in our environment very well, I hope to continue this issue.
          Hide
          zsl2007 zhangshilong added a comment -

          would you please tell me yarn version you used?
          In trunk:
          FairScheduler.getQueueUserAclInfo will not lock the FSQueue object.
          FSQueue object will be locked only when decResourceUsage or incrResourceUsage.
          FairScheduler:

            @Override
            public List<QueueUserACLInfo> getQueueUserAclInfo() {
              UserGroupInformation user;
              try {
                user = UserGroupInformation.getCurrentUser();
              } catch (IOException ioe) {
                return new ArrayList<QueueUserACLInfo>();
              }
          
              return queueMgr.getRootQueue().getQueueUserAclInfo(user);
            }
          

          FSParentQueue.java

            @Override
            public List<QueueUserACLInfo> getQueueUserAclInfo(UserGroupInformation user) {
              List<QueueUserACLInfo> userAcls = new ArrayList<>();
              
              // Add queue acls
              userAcls.add(getUserAclInfo(user));
              
              // Add children queue acls
              readLock.lock();
              try {
                for (FSQueue child : childQueues) {
                  userAcls.addAll(child.getQueueUserAclInfo(user));
                }
              } finally {
                readLock.unlock();
              }
           
              return userAcls;
            }
          
          Show
          zsl2007 zhangshilong added a comment - would you please tell me yarn version you used? In trunk: FairScheduler.getQueueUserAclInfo will not lock the FSQueue object. FSQueue object will be locked only when decResourceUsage or incrResourceUsage. FairScheduler: @Override public List<QueueUserACLInfo> getQueueUserAclInfo() { UserGroupInformation user; try { user = UserGroupInformation.getCurrentUser(); } catch (IOException ioe) { return new ArrayList<QueueUserACLInfo>(); } return queueMgr.getRootQueue().getQueueUserAclInfo(user); } FSParentQueue.java @Override public List<QueueUserACLInfo> getQueueUserAclInfo(UserGroupInformation user) { List<QueueUserACLInfo> userAcls = new ArrayList<>(); // Add queue acls userAcls.add(getUserAclInfo(user)); // Add children queue acls readLock.lock(); try { for (FSQueue child : childQueues) { userAcls.addAll(child.getQueueUserAclInfo(user)); } } finally { readLock.unlock(); } return userAcls; }
          Hide
          zsl2007 zhangshilong added a comment -

          I see. In branch-2.6,
          FSParentQueue.java: will lock FSQueue. But the code changes in version 2.7.1

            @Override
            public synchronized List<QueueUserACLInfo> getQueueUserAclInfo(
                UserGroupInformation user) {
              List<QueueUserACLInfo> userAcls = new ArrayList<QueueUserACLInfo>();
              
              // Add queue acls
              userAcls.add(getUserAclInfo(user));
              
              // Add children queue acls
              for (FSQueue child : childQueues) {
                userAcls.addAll(child.getQueueUserAclInfo(user));
              }
           
              return userAcls;
            }
          
          Show
          zsl2007 zhangshilong added a comment - I see. In branch-2.6, FSParentQueue.java: will lock FSQueue. But the code changes in version 2.7.1 @Override public synchronized List<QueueUserACLInfo> getQueueUserAclInfo( UserGroupInformation user) { List<QueueUserACLInfo> userAcls = new ArrayList<QueueUserACLInfo>(); // Add queue acls userAcls.add(getUserAclInfo(user)); // Add children queue acls for (FSQueue child : childQueues) { userAcls.addAll(child.getQueueUserAclInfo(user)); } return userAcls; }
          Hide
          zsl2007 zhangshilong added a comment -

          fix 2.6 deadlock in FSParentQueue.java

          Show
          zsl2007 zhangshilong added a comment - fix 2.6 deadlock in FSParentQueue.java
          Hide
          xinxianyin Xianyin Xin added a comment -

          Hi zhangshilong, sorry i have moved to another project, and don't have enough time. Change it to unassigned, and anyone who wants to take over is welcome.

          Show
          xinxianyin Xianyin Xin added a comment - Hi zhangshilong , sorry i have moved to another project, and don't have enough time. Change it to unassigned, and anyone who wants to take over is welcome.
          Hide
          zsl2007 zhangshilong added a comment -

          no problem, thank you very much for your patch, a great help to me.

          Show
          zsl2007 zhangshilong added a comment - no problem, thank you very much for your patch, a great help to me.
          Hide
          zhengchenyu zhengchenyu added a comment -

          Take attention to the YARN-5188 which has better performance. Our patch is only change from "synchronized" to ReadWriteLock so that dead-lock disappeared.

          Show
          zhengchenyu zhengchenyu added a comment - Take attention to the YARN-5188 which has better performance. Our patch is only change from "synchronized" to ReadWriteLock so that dead-lock disappeared.
          Hide
          yufeigu Yufei Gu added a comment -

          Submit the patch to let Hadoop QA kick in.
          Hi zhangshilong, are you still working on this?

          Show
          yufeigu Yufei Gu added a comment - Submit the patch to let Hadoop QA kick in. Hi zhangshilong , are you still working on this?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 11s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 13m 14s trunk passed
          +1 compile 0m 38s trunk passed
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 40s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 1m 8s trunk passed
          +1 javadoc 0m 23s trunk passed
          -1 mvninstall 0m 19s hadoop-yarn-server-resourcemanager in the patch failed.
          -1 compile 0m 19s hadoop-yarn-server-resourcemanager in the patch failed.
          -1 javac 0m 19s hadoop-yarn-server-resourcemanager in the patch failed.
          -0 checkstyle 0m 19s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 19 unchanged - 0 fixed = 20 total (was 19)
          -1 mvnsite 0m 19s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 mvneclipse 0m 11s the patch passed
          -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 findbugs 0m 15s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 javadoc 0m 18s the patch passed
          -1 unit 0m 18s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          20m 42s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-4090
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845162/YARN-4090.004.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 8b261bf98c0f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0914fcc
          Default Java 1.8.0_121
          findbugs v3.0.0
          mvninstall https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          compile https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          javac https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          mvnsite https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14818/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14818/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 13m 14s trunk passed +1 compile 0m 38s trunk passed +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 40s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 1m 8s trunk passed +1 javadoc 0m 23s trunk passed -1 mvninstall 0m 19s hadoop-yarn-server-resourcemanager in the patch failed. -1 compile 0m 19s hadoop-yarn-server-resourcemanager in the patch failed. -1 javac 0m 19s hadoop-yarn-server-resourcemanager in the patch failed. -0 checkstyle 0m 19s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 19 unchanged - 0 fixed = 20 total (was 19) -1 mvnsite 0m 19s hadoop-yarn-server-resourcemanager in the patch failed. +1 mvneclipse 0m 11s the patch passed -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 findbugs 0m 15s hadoop-yarn-server-resourcemanager in the patch failed. +1 javadoc 0m 18s the patch passed -1 unit 0m 18s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 20m 42s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-4090 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845162/YARN-4090.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8b261bf98c0f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0914fcc Default Java 1.8.0_121 findbugs v3.0.0 mvninstall https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt compile https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt javac https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt mvnsite https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14818/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14818/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14818/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zsl2007 zhangshilong added a comment -

          Thanks Yufei Gu. I will submit the new patch as soon as possible.

          Show
          zsl2007 zhangshilong added a comment - Thanks Yufei Gu . I will submit the new patch as soon as possible.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 14m 32s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 36s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 11s trunk passed
          +1 javadoc 0m 24s trunk passed
          +1 mvninstall 0m 38s the patch passed
          +1 compile 0m 36s the patch passed
          +1 javac 0m 36s the patch passed
          -0 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 19 unchanged - 0 fixed = 20 total (was 19)
          +1 mvnsite 0m 39s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 findbugs 1m 4s the patch passed
          +1 javadoc 0m 18s the patch passed
          +1 unit 40m 38s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          64m 16s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-4090
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850973/YARN-4090.005.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux d1b6c176e4bc 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 3ea6d35
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14826/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/14826/artifact/patchprocess/whitespace-eol.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14826/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14826/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 14m 32s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 36s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 11s trunk passed +1 javadoc 0m 24s trunk passed +1 mvninstall 0m 38s the patch passed +1 compile 0m 36s the patch passed +1 javac 0m 36s the patch passed -0 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 19 unchanged - 0 fixed = 20 total (was 19) +1 mvnsite 0m 39s the patch passed +1 mvneclipse 0m 13s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 findbugs 1m 4s the patch passed +1 javadoc 0m 18s the patch passed +1 unit 40m 38s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 64m 16s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-4090 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850973/YARN-4090.005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d1b6c176e4bc 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3ea6d35 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14826/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/14826/artifact/patchprocess/whitespace-eol.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14826/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14826/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zsl2007 zhangshilong added a comment -

          so sorry for whitespace.. A new patch will be submitted.
          I think there is no need for more unitTests。What do you think? Yufei Gu

          Show
          zsl2007 zhangshilong added a comment - so sorry for whitespace.. A new patch will be submitted. I think there is no need for more unitTests。What do you think? Yufei Gu
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 13m 6s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 35s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 4s trunk passed
          +1 javadoc 0m 22s trunk passed
          +1 mvninstall 0m 32s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          +1 checkstyle 0m 19s the patch passed
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 10s the patch passed
          +1 javadoc 0m 19s the patch passed
          +1 unit 40m 28s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          62m 16s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-4090
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12851303/YARN-4090.006.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 97decb7240ad 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d88497d
          Default Java 1.8.0_121
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14840/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14840/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 13m 6s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 35s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 4s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 32s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 19s the patch passed +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 10s the patch passed +1 javadoc 0m 19s the patch passed +1 unit 40m 28s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 62m 16s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-4090 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12851303/YARN-4090.006.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 97decb7240ad 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d88497d Default Java 1.8.0_121 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14840/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14840/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          yufeigu Yufei Gu added a comment - - edited

          Hi zhangshilong, thanks for woking on this. I could misunderstand your patch. To make sure we are in the same page, let me clarify this: we are trying to remove the recursive computation of getResourceUsage() to improve the performance. Right? Base on that, here are some thoughts:

          1. FSParentQueue don’t do the recursive getResourceUsage() in your patch and resource usage updates while allocating, recovering and moving. Does FSParentQueue get updated on resource usage when an application finishes or its tasks finish? Resource usage is a critical metrics of queues, it would be nice to including some unit tests (or modification of existing tests) to make sure we do it right.
          2. Any reason we don't refactor the getResourceUsage() in FSLeafQueue?

          Some minor nits:

          1. Use fsQueue instead of queue so you don’t need to cast.
          2. Need one empty line after function move()
          Show
          yufeigu Yufei Gu added a comment - - edited Hi zhangshilong , thanks for woking on this. I could misunderstand your patch. To make sure we are in the same page, let me clarify this: we are trying to remove the recursive computation of getResourceUsage() to improve the performance. Right? Base on that, here are some thoughts: FSParentQueue don’t do the recursive getResourceUsage() in your patch and resource usage updates while allocating, recovering and moving. Does FSParentQueue get updated on resource usage when an application finishes or its tasks finish? Resource usage is a critical metrics of queues, it would be nice to including some unit tests (or modification of existing tests) to make sure we do it right. Any reason we don't refactor the getResourceUsage() in FSLeafQueue? Some minor nits: Use fsQueue instead of queue so you don’t need to cast. Need one empty line after function move()
          Hide
          zsl2007 zhangshilong added a comment -

          Thanks Yufei Gu.
          when application finishes or its tasks finish, FSParentQueue and FSLeafQueue should update resourceUsage. Even in Preempte, resourceUsage should be updated. In Xianyin Xin's patch YARN-4090.003.patch,Preempte and tasks finish Have been considered. When creating the patch file, one of my commits is ignored by mistake.
          In my thought, resourceUsage in FSParentQueue and FSLeafQueue will be updated while allocating, taskComplete and Preempte.
          As Messages from QA, I found unittests are needed, So I will add unitests for calculating resourceUsage.

          Show
          zsl2007 zhangshilong added a comment - Thanks Yufei Gu . when application finishes or its tasks finish, FSParentQueue and FSLeafQueue should update resourceUsage. Even in Preempte, resourceUsage should be updated. In Xianyin Xin 's patch YARN-4090 .003.patch,Preempte and tasks finish Have been considered. When creating the patch file, one of my commits is ignored by mistake. In my thought, resourceUsage in FSParentQueue and FSLeafQueue will be updated while allocating, taskComplete and Preempte. As Messages from QA, I found unittests are needed, So I will add unitests for calculating resourceUsage.
          Hide
          yufeigu Yufei Gu added a comment -

          Thanks zhangshilong.
          I think you get it covered. Things might affect resource usage would be:

          1. Submit an application
          2. Allocate tasks
          3. Move app from one queue to another
          4. Complete containers (both apps and tasks finish)
          5. Preempt containers
          6. Nodes are down, this might not affect it directly, need to double check. Thanks Ray for pointing out.

          I could miss something. Correct me if you have anything else. I guess the test cases need to cover these. We might not need to worry about queue deletion since it seems not supported.

          YARN-4691 is about same thing for FSLeafQueue. Make sense to me to combine YARN-4691 in this JIRA.

          Show
          yufeigu Yufei Gu added a comment - Thanks zhangshilong . I think you get it covered. Things might affect resource usage would be: Submit an application Allocate tasks Move app from one queue to another Complete containers (both apps and tasks finish) Preempt containers Nodes are down, this might not affect it directly, need to double check. Thanks Ray for pointing out. I could miss something. Correct me if you have anything else. I guess the test cases need to cover these. We might not need to worry about queue deletion since it seems not supported. YARN-4691 is about same thing for FSLeafQueue. Make sense to me to combine YARN-4691 in this JIRA.
          Hide
          zsl2007 zhangshilong added a comment -

          Thanks Yufei Gu for remind for more information.
          YARN-4691 is about the same thing for ResourceUsage. this JIRA will solve the problem mentioned in YARN-4691.

          Show
          zsl2007 zhangshilong added a comment - Thanks Yufei Gu for remind for more information. YARN-4691 is about the same thing for ResourceUsage. this JIRA will solve the problem mentioned in YARN-4691 .
          Hide
          zsl2007 zhangshilong added a comment -

          Yufei Gu I found one problem when doing this issue.
          In FSLeafQueue: I think resourceUsage of app should not be changed in assignContainer because FairShareComparator uses resourceUsage to sort Apps.

          private TreeSet<FSAppAttempt> fetchAppsWithDemand() {
              TreeSet<FSAppAttempt> pendingForResourceApps =
                  new TreeSet<>(policy.getComparator());
              readLock.lock();
              try {
                for (FSAppAttempt app : runnableApps) {
                  Resource pending = app.getAppAttemptResourceUsage().getPending();
                  if (!pending.equals(none())) {
                    pendingForResourceApps.add(app);
                  }
                }
              } finally {
                readLock.unlock();
              }
              return pendingForResourceApps;
            }
          

          But In FSPreemptionThread run->preemptContainers->app.trackContainerForPreemption
          preemptedResources of app will be changed without FairScheduler Lock.
          So getResourceUsage of App will be changed in function: assignContainer in FSLeafQueue.

          @Override
            public Resource getResourceUsage() {
              /*
               * getResourcesToPreempt() returns zero, except when there are containers
               * to preempt. Avoid creating an object in the common case.
               */
              return getPreemptedResources().equals(Resources.none())
                  ? getCurrentConsumption()
                  : Resources.subtract(getCurrentConsumption(), getPreemptedResources());
            }
          
          Show
          zsl2007 zhangshilong added a comment - Yufei Gu I found one problem when doing this issue. In FSLeafQueue: I think resourceUsage of app should not be changed in assignContainer because FairShareComparator uses resourceUsage to sort Apps. private TreeSet<FSAppAttempt> fetchAppsWithDemand() { TreeSet<FSAppAttempt> pendingForResourceApps = new TreeSet<>(policy.getComparator()); readLock.lock(); try { for (FSAppAttempt app : runnableApps) { Resource pending = app.getAppAttemptResourceUsage().getPending(); if (!pending.equals(none())) { pendingForResourceApps.add(app); } } } finally { readLock.unlock(); } return pendingForResourceApps; } But In FSPreemptionThread run->preemptContainers->app.trackContainerForPreemption preemptedResources of app will be changed without FairScheduler Lock. So getResourceUsage of App will be changed in function: assignContainer in FSLeafQueue. @Override public Resource getResourceUsage() { /* * getResourcesToPreempt() returns zero, except when there are containers * to preempt. Avoid creating an object in the common case . */ return getPreemptedResources().equals(Resources.none()) ? getCurrentConsumption() : Resources.subtract(getCurrentConsumption(), getPreemptedResources()); }
          Hide
          yufeigu Yufei Gu added a comment -

          zhangshilong, I believe Karthik has answered your question in YARN-4752.

          Show
          yufeigu Yufei Gu added a comment - zhangshilong , I believe Karthik has answered your question in YARN-4752 .
          Hide
          ashwinshankar77 Ashwin Shankar added a comment -

          zhangshilong Yufei Gu is this patch going to be committed? looks like a valuable optimization

          Show
          ashwinshankar77 Ashwin Shankar added a comment - zhangshilong Yufei Gu is this patch going to be committed? looks like a valuable optimization
          Hide
          yufeigu Yufei Gu added a comment -

          Yes, at least I hope so, but the patch is not ready.

          Show
          yufeigu Yufei Gu added a comment - Yes, at least I hope so, but the patch is not ready.
          Hide
          yufeigu Yufei Gu added a comment -

          Hi zhangshilong, are you still working on this?

          Show
          yufeigu Yufei Gu added a comment - Hi zhangshilong , are you still working on this?
          Hide
          zsl2007 zhangshilong added a comment -

          I am very sorry. I am always working on YARN project.But Job is busy so I had no time to finish the patch. I will try my best to finish this before 2017.10.1.

          Show
          zsl2007 zhangshilong added a comment - I am very sorry. I am always working on YARN project.But Job is busy so I had no time to finish the patch. I will try my best to finish this before 2017.10.1.
          Hide
          yufeigu Yufei Gu added a comment -

          Hi zhangshilong, would you mind I take it over? I got a patch ready for post later today.

          Show
          yufeigu Yufei Gu added a comment - Hi zhangshilong , would you mind I take it over? I got a patch ready for post later today.
          Hide
          yufeigu Yufei Gu added a comment - - edited

          Uploaded patch v7. To cache the resource usage in queues. We should recalculate resource usage for each queue and it parents in following situation. Patch v7 covers all of them.

          • allocate, which increases resource usage
          • moveApplication, which increases resource usage in a queue and decrease
            resource usage in another queue
          • container completed, which decreases resource usage
          • preemption, which decreases resource usage when track resource to preempted for a job.
          • add a node -> recover containers on a node -> increase resource usage
          • remove a node -> complete a container -> decrease resource usage
          Show
          yufeigu Yufei Gu added a comment - - edited Uploaded patch v7. To cache the resource usage in queues. We should recalculate resource usage for each queue and it parents in following situation. Patch v7 covers all of them. allocate, which increases resource usage moveApplication, which increases resource usage in a queue and decrease resource usage in another queue container completed, which decreases resource usage preemption, which decreases resource usage when track resource to preempted for a job. add a node -> recover containers on a node -> increase resource usage remove a node -> complete a container -> decrease resource usage
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 14m 22s trunk passed
          +1 compile 0m 42s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 0m 42s trunk passed
          +1 findbugs 1m 4s trunk passed
          +1 javadoc 0m 22s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 35s the patch passed
          +1 compile 0m 33s the patch passed
          +1 javac 0m 33s the patch passed
          -0 checkstyle 0m 26s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 210 unchanged - 0 fixed = 211 total (was 210)
          +1 mvnsite 0m 36s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 8s the patch passed
          +1 javadoc 0m 19s the patch passed
                Other Tests
          -1 unit 43m 57s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 14s The patch does not generate ASF License warnings.
          67m 7s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue YARN-4090
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12882668/YARN-4090.007.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 67ef57bc6ceb 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 2d105a2
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16998/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/16998/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16998/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/16998/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests +1 mvninstall 14m 22s trunk passed +1 compile 0m 42s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 0m 42s trunk passed +1 findbugs 1m 4s trunk passed +1 javadoc 0m 22s trunk passed       Patch Compile Tests +1 mvninstall 0m 35s the patch passed +1 compile 0m 33s the patch passed +1 javac 0m 33s the patch passed -0 checkstyle 0m 26s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 210 unchanged - 0 fixed = 211 total (was 210) +1 mvnsite 0m 36s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 8s the patch passed +1 javadoc 0m 19s the patch passed       Other Tests -1 unit 43m 57s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 14s The patch does not generate ASF License warnings. 67m 7s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-4090 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12882668/YARN-4090.007.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 67ef57bc6ceb 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2d105a2 Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16998/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/16998/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16998/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16998/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          templedf Daniel Templeton added a comment -

          Thanks for moving this JIRA forward, Yufei Gu. This sounds like an important improvement that I'd like to see make it in by 3.0 beta 1. Beta 1 will be cut on 15th September 2017, so we'll need to move quickly. zhangshilong, would it be alright with you if Yufei Gu takes this JIRA over, or do you think you'll be able to wrap it up in the next 3 weeks?

          Show
          templedf Daniel Templeton added a comment - Thanks for moving this JIRA forward, Yufei Gu . This sounds like an important improvement that I'd like to see make it in by 3.0 beta 1. Beta 1 will be cut on 15th September 2017, so we'll need to move quickly. zhangshilong , would it be alright with you if Yufei Gu takes this JIRA over, or do you think you'll be able to wrap it up in the next 3 weeks?
          Hide
          zsl2007 zhangshilong added a comment - - edited

          Daniel Templeton Yufei Gu Never mind. It is my fault.

          Show
          zsl2007 zhangshilong added a comment - - edited Daniel Templeton Yufei Gu Never mind. It is my fault.
          Hide
          yufeigu Yufei Gu added a comment -

          Thanks zhangshilong.

          Show
          yufeigu Yufei Gu added a comment - Thanks zhangshilong .
          Hide
          templedf Daniel Templeton added a comment -

          Thanks for the updated patch, Yufei Gu. Two comments. First the management of preempted containers' resources seems to run at odds with YARN-7057. Maybe update this patch after you commit that one? Second, note that the CS AbstractCSQueue has incUsedResource() and decUsedResource() methods already. Maybe in the interest of commonality you could see if it's worth pushing those up to Queue and using them instead of adding your own methods for FS.

          Show
          templedf Daniel Templeton added a comment - Thanks for the updated patch, Yufei Gu . Two comments. First the management of preempted containers' resources seems to run at odds with YARN-7057 . Maybe update this patch after you commit that one? Second, note that the CS AbstractCSQueue has incUsedResource() and decUsedResource() methods already. Maybe in the interest of commonality you could see if it's worth pushing those up to Queue and using them instead of adding your own methods for FS.
          Hide
          yufeigu Yufei Gu added a comment -

          Thanks for the review, Daniel Templeton. Uploaded patch v8 for your first comment. I've changed function name to incUsedResource and decUsedResource, I didn't extract the functions into class Queue since the one in CS has different function signatures. We could do that refactor while we add node label to FS.

          Show
          yufeigu Yufei Gu added a comment - Thanks for the review, Daniel Templeton . Uploaded patch v8 for your first comment. I've changed function name to incUsedResource and decUsedResource , I didn't extract the functions into class Queue since the one in CS has different function signatures. We could do that refactor while we add node label to FS.

            People

            • Assignee:
              yufeigu Yufei Gu
              Reporter:
              xinxianyin Xianyin Xin
            • Votes:
              2 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:

                Development