Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5555

Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: None
    • Labels:
      None

      Description

      If a leaf queue is hierarchically nested (e.g., root.a.a1, root.a.a2), the values in the "% of Queue" column in the apps section of the Scheduler UI is calculated as if the leaf queue (a1) were a direct child of root.

      1. PctOfQueueIsInaccurate.jpg
        344 kB
        Eric Payne
      2. YARN-5555.001.patch
        9 kB
        Eric Payne

        Activity

        Hide
        eepayne Eric Payne added a comment -

        The queue structure for the attached screenshot (PctOfQueueIsInnaccurate.jpg) has the following attributes:

        Cluster Capacity root.swords.capacity root.swords.brisingr.capacity
        12288 MB 20% 25%

        There are 3 apps running in the root.swords.brisingr queue. The attributes for each of these apps are as follows:

        App Name Allocated Memory MB % of Queue
        application_1471969002932_0001 4608 MB 150.0
        application_1471969002932_0002 4608 MB 150.0
        application_1471969002932_0003 3072 MB 100.0

        The value to the right of the Queue: swords.brisingr bar graph says that the queue is 2001.3% used. This value is (almost) accurate because the actual memory allocation allotted to root.swords.brisingr is 12288 MB * 20% * 25% = 614.4 MB. Since root.swords.brisingr is consuming all 12288 MB, 12288 MB / 614.4 MB = 20 * 100% = 2000%

        However, the sum of the % of Queue column for all apps running in root.swords.brisingr is 100.0% + 150.0% + 150.0% = 400%. This is inaccurate.

        It appears as if the calculations are not taking into account the capacity of the parent queue, root.swords: 20%. For example,application_1471969002932_0001's usage is 4608 MB, and 12288 MB * 25% = 3072 MB, and 4608 / 3072 = 1.5 * 100% = 150%. This calculation should have been 4608 / 614.4 = 7.5 * 100% = 750%.

        RMAppsBlock#renderData is calling ApplicationResourceUsageReport, which eventually calls SchedulerApplicationAttempt#getResourceUsageReport.
        The following code in getResourceUsageReport, I think, needs to walk back up the parent tree to get all of the capacity values, not just the one for the leaf queue:

              queueUsagePerc =
                  calc.divide(cluster, usedResourceClone, Resources.multiply(cluster,
                      queue.getQueueInfo(false, false).getCapacity())) * 100;
        
        Show
        eepayne Eric Payne added a comment - The queue structure for the attached screenshot (PctOfQueueIsInnaccurate.jpg) has the following attributes: Cluster Capacity root.swords.capacity root.swords.brisingr.capacity 12288 MB 20% 25% There are 3 apps running in the root.swords.brisingr queue. The attributes for each of these apps are as follows: App Name Allocated Memory MB % of Queue application_1471969002932_0001 4608 MB 150.0 application_1471969002932_0002 4608 MB 150.0 application_1471969002932_0003 3072 MB 100.0 The value to the right of the Queue: swords.brisingr bar graph says that the queue is 2001.3% used. This value is (almost) accurate because the actual memory allocation allotted to root.swords.brisingr is 12288 MB * 20% * 25% = 614.4 MB . Since root.swords.brisingr is consuming all 12288 MB, 12288 MB / 614.4 MB = 20 * 100% = 2000% However, the sum of the % of Queue column for all apps running in root.swords.brisingr is 100.0% + 150.0% + 150.0% = 400% . This is inaccurate. It appears as if the calculations are not taking into account the capacity of the parent queue, root.swords: 20% . For example, application_1471969002932_0001 's usage is 4608 MB, and 12288 MB * 25% = 3072 MB , and 4608 / 3072 = 1.5 * 100% = 150% . This calculation should have been 4608 / 614.4 = 7.5 * 100% = 750% . RMAppsBlock#renderData is calling ApplicationResourceUsageReport , which eventually calls SchedulerApplicationAttempt#getResourceUsageReport . The following code in getResourceUsageReport , I think, needs to walk back up the parent tree to get all of the capacity values, not just the one for the leaf queue: queueUsagePerc = calc.divide(cluster, usedResourceClone, Resources.multiply(cluster, queue.getQueueInfo( false , false ).getCapacity())) * 100;
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 20s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 8m 16s trunk passed
        +1 compile 0m 38s trunk passed
        +1 checkstyle 0m 28s trunk passed
        +1 mvnsite 0m 46s trunk passed
        +1 mvneclipse 0m 17s trunk passed
        +1 findbugs 1m 5s trunk passed
        +1 javadoc 0m 24s trunk passed
        +1 mvninstall 0m 40s the patch passed
        +1 compile 0m 33s the patch passed
        +1 javac 0m 33s the patch passed
        -1 checkstyle 0m 24s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 434 unchanged - 0 fixed = 436 total (was 434)
        +1 mvnsite 0m 38s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 1s the patch passed
        +1 javadoc 0m 18s the patch passed
        +1 unit 38m 3s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 18s The patch does not generate ASF License warnings.
        55m 6s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826249/YARN-5555.001.patch
        JIRA Issue YARN-5555
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux c6993de44735 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 9dcbdbd
        Default Java 1.8.0_101
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12952/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12952/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12952/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 16s trunk passed +1 compile 0m 38s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 46s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 1m 5s trunk passed +1 javadoc 0m 24s trunk passed +1 mvninstall 0m 40s the patch passed +1 compile 0m 33s the patch passed +1 javac 0m 33s the patch passed -1 checkstyle 0m 24s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 434 unchanged - 0 fixed = 436 total (was 434) +1 mvnsite 0m 38s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 1s the patch passed +1 javadoc 0m 18s the patch passed +1 unit 38m 3s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 55m 6s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826249/YARN-5555.001.patch JIRA Issue YARN-5555 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c6993de44735 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9dcbdbd Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12952/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12952/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12952/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        vvasudev Varun Vasudev added a comment -

        Thanks for the patch Eric Payne. +1. I'll commit it tomorrow if no one objects.

        Show
        vvasudev Varun Vasudev added a comment - Thanks for the patch Eric Payne . +1. I'll commit it tomorrow if no one objects.
        Hide
        vvasudev Varun Vasudev added a comment -

        Committed to trunk and branch-2. Thanks for the patch Eric Payne!

        Show
        vvasudev Varun Vasudev added a comment - Committed to trunk and branch-2. Thanks for the patch Eric Payne !
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10389 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10389/)
        YARN-5555. Scheduler UI: "% of Queue" is inaccurate if leaf queue is (vvasudev: rev 05f5c0f631680cffc36a79550c351620615445db)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10389 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10389/ ) YARN-5555 . Scheduler UI: "% of Queue" is inaccurate if leaf queue is (vvasudev: rev 05f5c0f631680cffc36a79550c351620615445db) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
        Hide
        eepayne Eric Payne added a comment -

        Thank you very much, Varun Vasudev, for the review and the commit.

        Show
        eepayne Eric Payne added a comment - Thank you very much, Varun Vasudev , for the review and the commit.
        Hide
        eepayne Eric Payne added a comment -

        Any objections if I backport this to branch-2.8?

        Show
        eepayne Eric Payne added a comment - Any objections if I backport this to branch-2.8?
        Hide
        vvasudev Varun Vasudev added a comment -

        Nope. I'm fine with it.

        Show
        vvasudev Varun Vasudev added a comment - Nope. I'm fine with it.
        Hide
        eepayne Eric Payne added a comment -

        Thanks Varun Saxena. I have backported this to 2.8.0

        Show
        eepayne Eric Payne added a comment - Thanks Varun Saxena . I have backported this to 2.8.0
        Hide
        andrew.wang Andrew Wang added a comment -

        As a reminder, please set a 3.x fix version when committing too. Thanks!

        Show
        andrew.wang Andrew Wang added a comment - As a reminder, please set a 3.x fix version when committing too. Thanks!

          People

          • Assignee:
            eepayne Eric Payne
            Reporter:
            eepayne Eric Payne
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development