Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3388

Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0, 2.8.0, 2.7.2, 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: capacityscheduler
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When there are multiple active users in a queue, it should be possible for those users to make use of capacity up-to max_capacity (or close). The resources should be fairly distributed among the active users in the queue. This works pretty well when there is a single resource being scheduled. However, when there are multiple resources the situation gets more complex and the current algorithm tends to get stuck at Capacity.

      Example illustrated in subsequent comment.

      1. YARN-3388-v0.patch
        5 kB
        Nathan Roberts
      2. YARN-3388-v1.patch
        18 kB
        Nathan Roberts
      3. YARN-3388-v2.patch
        25 kB
        Nathan Roberts
      4. YARN-3388-v3.patch
        31 kB
        Nathan Roberts
      5. YARN-3388-v4.patch
        30 kB
        Nathan Roberts
      6. YARN-3388-v5.patch
        31 kB
        Nathan Roberts
      7. YARN-3388-v6.patch
        31 kB
        Nathan Roberts
      8. YARN-3388-v7.patch
        30 kB
        Nathan Roberts

        Activity

        Hide
        leftnoteasy Wangda Tan added a comment -

        Committed to trunk/branch-2/branch-2.8, thanks Nathan Roberts for the patch and thanks Jason Lowe for reviews!

        Show
        leftnoteasy Wangda Tan added a comment - Committed to trunk/branch-2/branch-2.8, thanks Nathan Roberts for the patch and thanks Jason Lowe for reviews!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10313 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10313/)
        YARN-3388. Allocation in LeafQueue could get stuck because DRF (wangda: rev 444b2ea7afebf9f6c3d356154b71abfd0ea95b23)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10313 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10313/ ) YARN-3388 . Allocation in LeafQueue could get stuck because DRF (wangda: rev 444b2ea7afebf9f6c3d356154b71abfd0ea95b23) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        Hide
        leftnoteasy Wangda Tan added a comment -

        +1 to latest patch, will commit shortly.

        Show
        leftnoteasy Wangda Tan added a comment - +1 to latest patch, will commit shortly.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 7m 47s trunk passed
        +1 compile 0m 39s trunk passed
        +1 checkstyle 0m 28s trunk passed
        +1 mvnsite 0m 40s trunk passed
        +1 mvneclipse 0m 18s trunk passed
        +1 findbugs 1m 3s trunk passed
        +1 javadoc 0m 22s trunk passed
        +1 mvninstall 0m 35s the patch passed
        +1 compile 0m 34s the patch passed
        +1 javac 0m 34s the patch passed
        -1 checkstyle 0m 26s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 514 unchanged - 3 fixed = 517 total (was 517)
        +1 mvnsite 0m 39s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 8s the patch passed
        +1 javadoc 0m 20s the patch passed
        +1 unit 38m 18s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        54m 44s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824402/YARN-3388-v7.patch
        JIRA Issue YARN-3388
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 09eecc8d8e6f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / c5c3e81
        Default Java 1.8.0_101
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12828/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12828/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12828/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 47s trunk passed +1 compile 0m 39s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 40s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 1m 3s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 35s the patch passed +1 compile 0m 34s the patch passed +1 javac 0m 34s the patch passed -1 checkstyle 0m 26s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 514 unchanged - 3 fixed = 517 total (was 517) +1 mvnsite 0m 39s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 8s the patch passed +1 javadoc 0m 20s the patch passed +1 unit 38m 18s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 54m 44s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824402/YARN-3388-v7.patch JIRA Issue YARN-3388 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 09eecc8d8e6f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c5c3e81 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12828/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12828/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12828/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        nroberts Nathan Roberts added a comment -

        Thanks Wangda Tan for the comments. I took both suggestions in latest patch and cleaned up some remaining checkstyle issues.

        Show
        nroberts Nathan Roberts added a comment - Thanks Wangda Tan for the comments. I took both suggestions in latest patch and cleaned up some remaining checkstyle issues.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks Nathan Roberts for updating the patch,

        Two minor comments:

        Comments:

        1) Following check is not required for get/set/incUsageRatio:

                if (label == null) {
                  label = RMNodeLabelsManager.NO_LABEL;
                }
        

        Because we normalized all requests before submiting to scheduler, you can assume no null labels exist.

        2) public float getUsageRatio(String string), it's better to rename the parameter variable to "label" or "partition"

        Show
        leftnoteasy Wangda Tan added a comment - Thanks Nathan Roberts for updating the patch, Two minor comments: Comments: 1) Following check is not required for get/set/incUsageRatio: if (label == null ) { label = RMNodeLabelsManager.NO_LABEL; } Because we normalized all requests before submiting to scheduler, you can assume no null labels exist. 2) public float getUsageRatio(String string), it's better to rename the parameter variable to "label" or "partition"
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 17s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 7m 12s trunk passed
        +1 compile 0m 33s trunk passed
        +1 checkstyle 0m 27s trunk passed
        +1 mvnsite 0m 40s trunk passed
        +1 mvneclipse 0m 18s trunk passed
        +1 findbugs 1m 0s trunk passed
        +1 javadoc 0m 22s trunk passed
        +1 mvninstall 0m 32s the patch passed
        +1 compile 0m 29s the patch passed
        +1 javac 0m 29s the patch passed
        -1 checkstyle 0m 25s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 19 new + 514 unchanged - 3 fixed = 533 total (was 517)
        +1 mvnsite 0m 36s the patch passed
        +1 mvneclipse 0m 15s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 2s the patch passed
        +1 javadoc 0m 19s the patch passed
        +1 unit 37m 41s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        53m 3s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824180/YARN-3388-v6.patch
        JIRA Issue YARN-3388
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 5522b3b206a6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 7f05ff7
        Default Java 1.8.0_101
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12806/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12806/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12806/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 12s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 27s trunk passed +1 mvnsite 0m 40s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 1m 0s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 32s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed -1 checkstyle 0m 25s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 19 new + 514 unchanged - 3 fixed = 533 total (was 517) +1 mvnsite 0m 36s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 2s the patch passed +1 javadoc 0m 19s the patch passed +1 unit 37m 41s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 53m 3s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824180/YARN-3388-v6.patch JIRA Issue YARN-3388 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5522b3b206a6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7f05ff7 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12806/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12806/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12806/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        nroberts Nathan Roberts added a comment -

        Cleaned up most of the findbugs/checkstyle issues.

        Show
        nroberts Nathan Roberts added a comment - Cleaned up most of the findbugs/checkstyle issues.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 6m 56s trunk passed
        +1 compile 0m 32s trunk passed
        +1 checkstyle 0m 26s trunk passed
        +1 mvnsite 0m 39s trunk passed
        +1 mvneclipse 0m 16s trunk passed
        +1 findbugs 0m 57s trunk passed
        +1 javadoc 0m 21s trunk passed
        +1 mvninstall 0m 31s the patch passed
        +1 compile 0m 29s the patch passed
        -1 javac 0m 29s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 5 new + 3 unchanged - 0 fixed = 8 total (was 3)
        -1 checkstyle 0m 24s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 40 new + 515 unchanged - 2 fixed = 555 total (was 517)
        +1 mvnsite 0m 36s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        -1 findbugs 1m 3s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
        +1 javadoc 0m 18s the patch passed
        +1 unit 37m 41s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        52m 36s



        Reason Tests
        FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Primitive value is boxed then unboxed to perform primitive coercion in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$UsageRatios.getUsageRatio(String) At LeafQueue.java:unboxed to perform primitive coercion in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$UsageRatios.getUsageRatio(String) At LeafQueue.java:[line 1693]



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823963/YARN-3388-v5.patch
        JIRA Issue YARN-3388
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 3e7212c9c692 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / b047bc7
        Default Java 1.8.0_101
        findbugs v3.0.0
        javac https://builds.apache.org/job/PreCommit-YARN-Build/12788/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12788/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        findbugs https://builds.apache.org/job/PreCommit-YARN-Build/12788/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12788/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12788/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 56s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 39s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 57s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 29s the patch passed -1 javac 0m 29s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 5 new + 3 unchanged - 0 fixed = 8 total (was 3) -1 checkstyle 0m 24s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 40 new + 515 unchanged - 2 fixed = 555 total (was 517) +1 mvnsite 0m 36s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 1m 3s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 0m 18s the patch passed +1 unit 37m 41s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 52m 36s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager   Primitive value is boxed then unboxed to perform primitive coercion in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$UsageRatios.getUsageRatio(String) At LeafQueue.java:unboxed to perform primitive coercion in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$UsageRatios.getUsageRatio(String) At LeafQueue.java: [line 1693] Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823963/YARN-3388-v5.patch JIRA Issue YARN-3388 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 3e7212c9c692 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b047bc7 Default Java 1.8.0_101 findbugs v3.0.0 javac https://builds.apache.org/job/PreCommit-YARN-Build/12788/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12788/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/12788/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12788/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12788/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        nroberts Nathan Roberts added a comment -

        fixed build error.

        Show
        nroberts Nathan Roberts added a comment - fixed build error.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 19s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 6m 41s trunk passed
        +1 compile 0m 33s trunk passed
        +1 checkstyle 0m 26s trunk passed
        +1 mvnsite 0m 38s trunk passed
        +1 mvneclipse 0m 16s trunk passed
        +1 findbugs 0m 56s trunk passed
        +1 javadoc 0m 21s trunk passed
        -1 mvninstall 0m 18s hadoop-yarn-server-resourcemanager in the patch failed.
        -1 compile 0m 17s hadoop-yarn-server-resourcemanager in the patch failed.
        -1 javac 0m 17s hadoop-yarn-server-resourcemanager in the patch failed.
        -1 checkstyle 0m 25s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 40 new + 515 unchanged - 2 fixed = 555 total (was 517)
        -1 mvnsite 0m 18s hadoop-yarn-server-resourcemanager in the patch failed.
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        -1 findbugs 0m 17s hadoop-yarn-server-resourcemanager in the patch failed.
        -1 javadoc 0m 17s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 2 new + 963 unchanged - 0 fixed = 965 total (was 963)
        -1 unit 0m 18s hadoop-yarn-server-resourcemanager in the patch failed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        13m 23s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823947/YARN-3388-v4.patch
        JIRA Issue YARN-3388
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 0675252b6286 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / b427ce1
        Default Java 1.8.0_101
        findbugs v3.0.0
        mvninstall https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        compile https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        javac https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        mvnsite https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        findbugs https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        javadoc https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12787/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12787/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 41s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 56s trunk passed +1 javadoc 0m 21s trunk passed -1 mvninstall 0m 18s hadoop-yarn-server-resourcemanager in the patch failed. -1 compile 0m 17s hadoop-yarn-server-resourcemanager in the patch failed. -1 javac 0m 17s hadoop-yarn-server-resourcemanager in the patch failed. -1 checkstyle 0m 25s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 40 new + 515 unchanged - 2 fixed = 555 total (was 517) -1 mvnsite 0m 18s hadoop-yarn-server-resourcemanager in the patch failed. +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 0m 17s hadoop-yarn-server-resourcemanager in the patch failed. -1 javadoc 0m 17s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 2 new + 963 unchanged - 0 fixed = 965 total (was 963) -1 unit 0m 18s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 13m 23s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823947/YARN-3388-v4.patch JIRA Issue YARN-3388 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0675252b6286 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b427ce1 Default Java 1.8.0_101 findbugs v3.0.0 mvninstall https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt compile https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt javac https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt mvnsite https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt javadoc https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/12787/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12787/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12787/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        nroberts Nathan Roberts added a comment -

        Thanks for the comments Jason Lowe. upmerged and added whitespace.

        Show
        nroberts Nathan Roberts added a comment - Thanks for the comments Jason Lowe . upmerged and added whitespace.
        Hide
        jlowe Jason Lowe added a comment -

        Nathan Roberts the patch no longer applies to trunk. Could you please rebase? Looks good overall, but it would be nice to have some whitespace around the calculateUserUsageRatio, recalculateQueueUsageRatio, and getUsageRatio methods for readability.

        Show
        jlowe Jason Lowe added a comment - Nathan Roberts the patch no longer applies to trunk. Could you please rebase? Looks good overall, but it would be nice to have some whitespace around the calculateUserUsageRatio, recalculateQueueUsageRatio, and getUsageRatio methods for readability.
        Hide
        nroberts Nathan Roberts added a comment -

        Wangda Tan, Eric Payne. Ok, "soon" was extremely relative Sorry about that.

        I think I addressed Wangda's comments but I need label partition experts to take a look.

        Any ideas why people don't hit this more often? We find it's very easy to get stuck at queueCapacity even though userLimitFactor and maxCapacity say the system should allocate further. Do you think people aren't using DRF and are mostly just using memory as the resource?

        Show
        nroberts Nathan Roberts added a comment - Wangda Tan , Eric Payne . Ok, "soon" was extremely relative Sorry about that. I think I addressed Wangda's comments but I need label partition experts to take a look. Any ideas why people don't hit this more often? We find it's very easy to get stuck at queueCapacity even though userLimitFactor and maxCapacity say the system should allocate further. Do you think people aren't using DRF and are mostly just using memory as the resource?
        Hide
        nroberts Nathan Roberts added a comment -

        Thanks Wangda Tan for the comments. I agree 2b is the way to go. I will upload a new patch soon.

        Show
        nroberts Nathan Roberts added a comment - Thanks Wangda Tan for the comments. I agree 2b is the way to go. I will upload a new patch soon.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 15m 1s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
        +1 javac 7m 49s There were no new javac warning messages.
        +1 javadoc 9m 58s There were no new javadoc warning messages.
        +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
        -1 checkstyle 0m 50s The applied patch generated 15 new checkstyle issues (total was 159, now 173).
        -1 whitespace 0m 2s The patch has 10 line(s) that end in whitespace. Use git apply --whitespace=fix.
        +1 install 1m 38s mvn install still works.
        +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
        -1 findbugs 1m 21s The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.
        -1 yarn tests 48m 35s Tests failed in hadoop-yarn-server-resourcemanager.
            86m 12s  



        Reason Tests
        FindBugs module:hadoop-yarn-server-resourcemanager
          Unread field:LeafQueue.java:[line 1944]
        Failed unit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
          hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12732964/YARN-3388-v2.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / 09fe16f
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7943/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
        whitespace https://builds.apache.org/job/PreCommit-YARN-Build/7943/artifact/patchprocess/whitespace.txt
        Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/7943/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7943/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7943/testReport/
        Java 1.7.0_55
        uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/7943/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 1s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 49s There were no new javac warning messages. +1 javadoc 9m 58s There were no new javadoc warning messages. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 50s The applied patch generated 15 new checkstyle issues (total was 159, now 173). -1 whitespace 0m 2s The patch has 10 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 38s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. -1 findbugs 1m 21s The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. -1 yarn tests 48m 35s Tests failed in hadoop-yarn-server-resourcemanager.     86m 12s   Reason Tests FindBugs module:hadoop-yarn-server-resourcemanager   Unread field:LeafQueue.java: [line 1944] Failed unit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12732964/YARN-3388-v2.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 09fe16f checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7943/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/7943/artifact/patchprocess/whitespace.txt Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/7943/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7943/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7943/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7943/console This message was automatically generated.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks updating Nathan Roberts, took at look at latest patch, some comments:
        1) It may be better to rename rbl to partitionResource in a couple of places, rbl is not a very clear name to me.

        2) One bigger problem is, updateClusterResource only considered NO_LABEL, but computeUserLimit uses getUsageRatio for all partitions. It will be inaccurate if resource of partition updated.
        Solution could be:
        a. Only use getUsageRatio when partition=NO_LABEL
        b. Recomputes all partitions when updateClusterResource.

        I prefer b since other code path in your patch are all considered partitions. You can take a look at CSQueueUtils#updateQueueStatistics, they should have very similar logic to handle partitions when cluster resource updates.

        3) It's better not put the user-usage-ratio in ResourceUsage, ResourceUsage is targeting to track common resources for user/app/queue. I suggest to create a ResourceUsage-like structure in LeafQueue, and User/LeafQueue will share it.

        4) Better to split and rename User.updateUsageRatio to User.updateAndGetDeltaOfDominateResourceRatio and User.updateAndGetDominateResourceRatio, the "reset" parameter is not very straight-forward to me.

        Show
        leftnoteasy Wangda Tan added a comment - Thanks updating Nathan Roberts , took at look at latest patch, some comments: 1) It may be better to rename rbl to partitionResource in a couple of places, rbl is not a very clear name to me. 2) One bigger problem is, updateClusterResource only considered NO_LABEL, but computeUserLimit uses getUsageRatio for all partitions. It will be inaccurate if resource of partition updated. Solution could be: a. Only use getUsageRatio when partition=NO_LABEL b. Recomputes all partitions when updateClusterResource. I prefer b since other code path in your patch are all considered partitions. You can take a look at CSQueueUtils#updateQueueStatistics, they should have very similar logic to handle partitions when cluster resource updates. 3) It's better not put the user-usage-ratio in ResourceUsage, ResourceUsage is targeting to track common resources for user/app/queue. I suggest to create a ResourceUsage-like structure in LeafQueue, and User/LeafQueue will share it. 4) Better to split and rename User.updateUsageRatio to User.updateAndGetDeltaOfDominateResourceRatio and User.updateAndGetDominateResourceRatio, the "reset" parameter is not very straight-forward to me.
        Hide
        nroberts Nathan Roberts added a comment -

        Wangda Tan, please take a look at this version of the patch.

        Show
        nroberts Nathan Roberts added a comment - Wangda Tan , please take a look at this version of the patch.
        Hide
        nroberts Nathan Roberts added a comment -

        Yes. I have a patch which I think is close. I need to merge to latest trunk. then I'll post for review.

        Show
        nroberts Nathan Roberts added a comment - Yes. I have a patch which I think is close. I need to merge to latest trunk. then I'll post for review.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Nathan Roberts, any updates on this?

        Show
        leftnoteasy Wangda Tan added a comment - Nathan Roberts , any updates on this?
        Hide
        leftnoteasy Wangda Tan added a comment -

        Nathan Roberts,
        Make sense to me. I really appreciate if you can wait YARN-3361 goes first since it almost closed and it blocks a couple of CS changes in our side. Upon YARN-3361 committed. And I can help with supporting better DRF calculation for partition as well (including supports dominate consumed ratio for partitions) in CS.

        Thanks,

        Show
        leftnoteasy Wangda Tan added a comment - Nathan Roberts , Make sense to me. I really appreciate if you can wait YARN-3361 goes first since it almost closed and it blocks a couple of CS changes in our side. Upon YARN-3361 committed. And I can help with supporting better DRF calculation for partition as well (including supports dominate consumed ratio for partitions) in CS. Thanks,
        Hide
        nroberts Nathan Roberts added a comment -

        Thanks Wangda Tan for the comments.

        when doing allocation under a labeled node, user-limit checking in the patch is incorrect.

        I don't think it's any more incorrect than it was prior to the patch. Both trunk and this patch use queueUsage.getUsed() to calculate currentCapacity. iiuc, this is wrong when looking at labeled nodes. Trunk is also using the partition from the resource request and not the partition from the node being evaluated, which I think is also incorrect. I think it's more correct after YARN-3361 but that's not there yet.

        I don't think I made things any worse than trunk is today, but I can wait until YARN-3361 is in if that will make things easier.

        I can change the name to include Dominant.

        The test case you mention should be in there. Without the fix the following assert will fail because we can't get above the queue's capacity of 80%

            assertTrue(
                "Exepected AbsoluteUsedCapacity > 0.95, got: "
                    + b.getAbsoluteUsedCapacity(), b.getAbsoluteUsedCapacity() > 0.95);
        
        
        Show
        nroberts Nathan Roberts added a comment - Thanks Wangda Tan for the comments. when doing allocation under a labeled node, user-limit checking in the patch is incorrect. I don't think it's any more incorrect than it was prior to the patch. Both trunk and this patch use queueUsage.getUsed() to calculate currentCapacity. iiuc, this is wrong when looking at labeled nodes. Trunk is also using the partition from the resource request and not the partition from the node being evaluated, which I think is also incorrect. I think it's more correct after YARN-3361 but that's not there yet. I don't think I made things any worse than trunk is today, but I can wait until YARN-3361 is in if that will make things easier. I can change the name to include Dominant. The test case you mention should be in there. Without the fix the following assert will fail because we can't get above the queue's capacity of 80% assertTrue( "Exepected AbsoluteUsedCapacity > 0.95, got: " + b.getAbsoluteUsedCapacity(), b.getAbsoluteUsedCapacity() > 0.95);
        Hide
        leftnoteasy Wangda Tan added a comment -

        Nathan Roberts,
        Thanks for updating, I took a look at your patch, approach LGTM, but I think node label should be considered in the same JIRA, when doing allocation under a labeled node, user-limit checking in the patch is incorrect. Actually user-limit for exclusive node label is already supported in latest trunk, and user-limit for non-exclusive node label is contains in YARN-3361, I think after YARN-3361, user-limit for node label will be in a good shape.
        Would you mind to take a look at computeUserLimit method of the patch attached in YARN-3361?

        To support computing consumed-per-partition, User.updateUsageRatio need receive partition as parameter.

        Some other comments:

        Show
        leftnoteasy Wangda Tan added a comment - Nathan Roberts , Thanks for updating, I took a look at your patch, approach LGTM, but I think node label should be considered in the same JIRA, when doing allocation under a labeled node, user-limit checking in the patch is incorrect. Actually user-limit for exclusive node label is already supported in latest trunk, and user-limit for non-exclusive node label is contains in YARN-3361 , I think after YARN-3361 , user-limit for node label will be in a good shape. Would you mind to take a look at computeUserLimit method of the patch attached in YARN-3361 ? To support computing consumed-per-partition, User.updateUsageRatio need receive partition as parameter. Some other comments: consumedRatio -> totalDominateConsumed or some other name, it's better to make sum of "dominate" consumed in the name consumed -> totalDominatedConsumedByPartition. It's better to add a test case to make sure allocation locked descripted in https://issues.apache.org/jira/browse/YARN-3388?focusedCommentId=14376060&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14376060 will not happened.
        Hide
        nroberts Nathan Roberts added a comment -

        Test failures don't appear related to patch. Ran failing tests locally and they pass.

        Show
        nroberts Nathan Roberts added a comment - Test failures don't appear related to patch. Ran failing tests locally and they pass.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12709050/YARN-3388-v1.patch
        against trunk revision eccb7d4.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
        org.apache.hadoop.yarn.server.resourcemanager.TestRM

        The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7201//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7201//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709050/YARN-3388-v1.patch against trunk revision eccb7d4. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.TestRM The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7201//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7201//console This message is automatically generated.
        Hide
        nroberts Nathan Roberts added a comment -

        Hi Wangda Tan. Uploaded a new version of patch that addresses the inefficiency and adds unit tests.

        I think label support is better left for a separate jira when labels are fully working with userlimits.

        Show
        nroberts Nathan Roberts added a comment - Hi Wangda Tan . Uploaded a new version of patch that addresses the inefficiency and adds unit tests. I think label support is better left for a separate jira when labels are fully working with userlimits.
        Hide
        nroberts Nathan Roberts added a comment -

        Wangda Tan - Thanks for the comments. I'm tweaking patch to avoid summing so often.

        Show
        nroberts Nathan Roberts added a comment - Wangda Tan - Thanks for the comments. I'm tweaking patch to avoid summing so often.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Updated description.

        Show
        leftnoteasy Wangda Tan added a comment - Updated description.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Hi Nathan Roberts,
        Sorry for my late response and thanks for reporting/working on this. I think your proposal should be good, it computes Σ(user.dominate_share) and user with smallest dominate_share can always continue.
        For implementations:

        • updateConsumedRatio being called when clusterResource changed or any resource allocated, but it needs loop all users in the LeafQueue. This should be improved, there could be 100 or more users in a queue.

        I think a similar way is, we can save the "user.dominate_share" in each user, and also total_dominate_share = Σ(user.dominate_share) in each LeafQueue, with this, we need only O(1) time when resource allocated/released and O(#user) time when clusterResource changed. Resource allocation/release seems more frequent than clusterResource changed to me.

        For label support, in addition to above (if you think above suggestion is fine), we need record user.dominate_share-by-label and total_dominate_share-by-label. Which could solve user-limit-by-label problem.

        Please let me know your thoughts.

        Thanks,

        Show
        leftnoteasy Wangda Tan added a comment - Hi Nathan Roberts , Sorry for my late response and thanks for reporting/working on this. I think your proposal should be good, it computes Σ(user.dominate_share) and user with smallest dominate_share can always continue. For implementations: updateConsumedRatio being called when clusterResource changed or any resource allocated, but it needs loop all users in the LeafQueue. This should be improved, there could be 100 or more users in a queue. I think a similar way is, we can save the "user.dominate_share" in each user, and also total_dominate_share = Σ(user.dominate_share) in each LeafQueue, with this, we need only O(1) time when resource allocated/released and O(#user) time when clusterResource changed. Resource allocation/release seems more frequent than clusterResource changed to me. For label support, in addition to above (if you think above suggestion is fine), we need record user.dominate_share-by-label and total_dominate_share-by-label. Which could solve user-limit-by-label problem. Please let me know your thoughts. Thanks,
        Hide
        nroberts Nathan Roberts added a comment -

        Initial patch for comments on approach. Seems to work well in basic testing on 2.6. I don't know how this interacts with label support + userlimit which I think is still lacking in some cases anyway. Hoping Wangda Tan and others can comment.

        Show
        nroberts Nathan Roberts added a comment - Initial patch for comments on approach. Seems to work well in basic testing on 2.6. I don't know how this interacts with label support + userlimit which I think is still lacking in some cases anyway. Hoping Wangda Tan and others can comment.
        Hide
        nroberts Nathan Roberts added a comment -

        Example (lots of things going on in this algorithm. I simplified to just the key pieces for clarity.)
        tuples are resources [memory] or [memory,cpu]

        just memory:
        -----------------
        Queue Capacity is [100]
        2 active users, both request [10] at a time
        User1 is at [45]
        User2 is at [40]
        Limit is calculated to be 100/2=50, both users can allocate
        User2 goes to [50] - now used Capacity is 45+50=95
        Limit is still 50
        User1 goes to [55] - used Capacity now 50+55=105
        Limit is now 105/2
        User2 goes to [60] - used Capacity is now 60+55=115
        Limit is now 115/2
        So on and so forth until maxCapacity is hit.
        Notice how the users essentially leap frog one another, allowing the Limit to continually move higher.

        memory and cpu
        ------------------------
        Queue Capacity is [100,100]
        2 active users, User1 asks for [10,20], User2 asks for [20,10]
        User1 is at [35,45]
        User2 is at [45,35]
        Limit is calculated to be [100/2=50,100/2=50], both users can allocate
        User2 goes to [65,45] - used Capacity is now [65+35=100,45+45=90]
        Limit is still [50,50]
        User1 goes to [45,65] - used Capacity is now [65+45=110,45+65=110]
        Limit is now [110/2=55, 110/2=55]
        User1 and User2 are now both considered over limit and neither can allocate. User1 is over on cpu, User2 is over on memory.

        Open to suggestions on simple ways to fix this. I'm currently thinking a reasonable (simple, effective, computationally cheap, mostly fair) approach might be to give some small percentage of additional leeway for userLimit.

        Show
        nroberts Nathan Roberts added a comment - Example (lots of things going on in this algorithm. I simplified to just the key pieces for clarity.) tuples are resources [memory] or [memory,cpu] just memory: ----------------- Queue Capacity is [100] 2 active users, both request [10] at a time User1 is at [45] User2 is at [40] Limit is calculated to be 100/2=50, both users can allocate User2 goes to [50] - now used Capacity is 45+50=95 Limit is still 50 User1 goes to [55] - used Capacity now 50+55=105 Limit is now 105/2 User2 goes to [60] - used Capacity is now 60+55=115 Limit is now 115/2 So on and so forth until maxCapacity is hit. Notice how the users essentially leap frog one another, allowing the Limit to continually move higher. memory and cpu ------------------------ Queue Capacity is [100,100] 2 active users, User1 asks for [10,20] , User2 asks for [20,10] User1 is at [35,45] User2 is at [45,35] Limit is calculated to be [100/2=50,100/2=50] , both users can allocate User2 goes to [65,45] - used Capacity is now [65+35=100,45+45=90] Limit is still [50,50] User1 goes to [45,65] - used Capacity is now [65+45=110,45+65=110] Limit is now [110/2=55, 110/2=55] User1 and User2 are now both considered over limit and neither can allocate. User1 is over on cpu, User2 is over on memory. Open to suggestions on simple ways to fix this. I'm currently thinking a reasonable (simple, effective, computationally cheap, mostly fair) approach might be to give some small percentage of additional leeway for userLimit.

          People

          • Assignee:
            nroberts Nathan Roberts
            Reporter:
            nroberts Nathan Roberts
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development