Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5969

FairShareComparator: Cache value of getResourceUsage for better performance

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.9.0, 3.0.0-alpha2
    • Component/s: fairscheduler
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      in FairShareComparator class, the performance of function getResourceUsage() is very poor. It will be executed above 100,000,000 times per second.
      In our scene, It takes 20 seconds per minute.
      A simple solution is to reduce call counts of the function.

      1. YARN-5969.patch
        3 kB
        zhangshilong
      2. 20161222.patch
        3 kB
        zhangshilong
      3. containerAllocated_after.png
        212 kB
        zhangshilong
      4. apprunning_after.png
        184 kB
        zhangshilong
      5. pending_before.png
        123 kB
        zhangshilong
      6. pending_after.png
        130 kB
        zhangshilong
      7. containerAllocatedDelta_before.png
        184 kB
        zhangshilong
      8. apprunning_before.png
        164 kB
        zhangshilong
      9. 20161206.patch
        2 kB
        zhangshilong

        Activity

        Hide
        yufeigu Yufei Gu added a comment -

        Absolutely! zhangshilong, thanks for working on this. Any contribution to community is welcome!

        Show
        yufeigu Yufei Gu added a comment - Absolutely! zhangshilong , thanks for working on this. Any contribution to community is welcome!
        Hide
        zsl2007 zhangshilong added a comment -

        Thanks Yufei Gu for advice and review and Karthik Kambatla for commit.
        YARN scale reaches nearly 4000 in our company, Fairscheduler performance encountered many problems, I hope to submit more optimizations to the community.

        Show
        zsl2007 zhangshilong added a comment - Thanks Yufei Gu for advice and review and Karthik Kambatla for commit. YARN scale reaches nearly 4000 in our company, Fairscheduler performance encountered many problems, I hope to submit more optimizations to the community.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11042 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11042/)
        YARN-5969. FairShareComparator: Cache value of getResourceUsage for (kasha: rev c3973e7080bf71b57ace4a6adf4bb43f3c5d35b5)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11042 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11042/ ) YARN-5969 . FairShareComparator: Cache value of getResourceUsage for (kasha: rev c3973e7080bf71b57ace4a6adf4bb43f3c5d35b5) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
        Hide
        kasha Karthik Kambatla added a comment -

        Just committed this to trunk and branch-2. Thanks zhangshilong for the patch and Yufei Gu for the review.

        Show
        kasha Karthik Kambatla added a comment - Just committed this to trunk and branch-2. Thanks zhangshilong for the patch and Yufei Gu for the review.
        Hide
        kasha Karthik Kambatla added a comment -

        +1. Checking this in.

        Show
        kasha Karthik Kambatla added a comment - +1. Checking this in.
        Hide
        yufeigu Yufei Gu added a comment -

        Thanks zhangshilong's new patch. LGTM. +1(non-binding). Would any committer take a look? cc Karthik Kambatla, Daniel Templeton.

        Show
        yufeigu Yufei Gu added a comment - Thanks zhangshilong 's new patch. LGTM. +1(non-binding). Would any committer take a look? cc Karthik Kambatla , Daniel Templeton .
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 10s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 12m 25s trunk passed
        +1 compile 0m 31s trunk passed
        +1 checkstyle 0m 20s trunk passed
        +1 mvnsite 0m 34s trunk passed
        +1 mvneclipse 0m 16s trunk passed
        +1 findbugs 0m 58s trunk passed
        +1 javadoc 0m 20s trunk passed
        +1 mvninstall 0m 30s the patch passed
        +1 compile 0m 29s the patch passed
        +1 javac 0m 29s the patch passed
        +1 checkstyle 0m 17s the patch passed
        +1 mvnsite 0m 31s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 5s the patch passed
        +1 javadoc 0m 18s the patch passed
        -1 unit 38m 32s hadoop-yarn-server-resourcemanager in the patch failed.
        +1 asflicense 0m 16s The patch does not generate ASF License warnings.
        58m 59s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-5969
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844728/YARN-5969.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux f3225c8a2434 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / ea54752
        Default Java 1.8.0_111
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-YARN-Build/14476/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14476/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/14476/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 12m 25s trunk passed +1 compile 0m 31s trunk passed +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 34s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 58s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 31s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 5s the patch passed +1 javadoc 0m 18s the patch passed -1 unit 38m 32s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 58m 59s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-5969 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844728/YARN-5969.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f3225c8a2434 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ea54752 Default Java 1.8.0_111 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/14476/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14476/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14476/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        zsl2007 zhangshilong added a comment -

        My fault, I mistaken the git address:https://github.com/apache/hadoop-common.git.
        I submitted a new patch using git address: https://github.com/apache/hadoop.git.

        Show
        zsl2007 zhangshilong added a comment - My fault, I mistaken the git address: https://github.com/apache/hadoop-common.git . I submitted a new patch using git address: https://github.com/apache/hadoop.git .
        Hide
        yufeigu Yufei Gu added a comment -

        Seems like the patch doesn't apply to trunk. Would you please revise it?

        Show
        yufeigu Yufei Gu added a comment - Seems like the patch doesn't apply to trunk. Would you please revise it?
        Hide
        tangzhankun Zhankun Tang added a comment -

        zhangshilong, Well done and thanks for the patch!

        Show
        tangzhankun Zhankun Tang added a comment - zhangshilong , Well done and thanks for the patch!
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 patch 0m 5s YARN-5969 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Issue YARN-5969
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844359/20161222.patch
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/14441/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 5s YARN-5969 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue YARN-5969 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844359/20161222.patch Console output https://builds.apache.org/job/PreCommit-YARN-Build/14441/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        yufeigu Yufei Gu added a comment -

        Submit the patch to let Hadoop QA kick in.

        Show
        yufeigu Yufei Gu added a comment - Submit the patch to let Hadoop QA kick in.
        Hide
        zsl2007 zhangshilong added a comment -


        Thanks yufei Gu for the reminder, I will improve my patch soon.

        Show
        zsl2007 zhangshilong added a comment - Thanks yufei Gu for the reminder, I will improve my patch soon.
        Hide
        yufeigu Yufei Gu added a comment -

        Thanks zhangshilong. I like the figures, especially the one about the allocated container # per minute. It did improve a lot based on your figures in this scale!

        The patch looks good to me generally. Minor nits:
        1. use meaningful variable names instead of "u1", "u2". It is OK to leave "s1"/"s2" since they are already there.
        2. It would be nice to put the comment about why you do this before the following code.

             Resource u1 = s1.getResourceUsage();
             Resource u2 = s2.getResourceUsage();
        
        Show
        yufeigu Yufei Gu added a comment - Thanks zhangshilong . I like the figures, especially the one about the allocated container # per minute. It did improve a lot based on your figures in this scale! The patch looks good to me generally. Minor nits: 1. use meaningful variable names instead of "u1", "u2". It is OK to leave "s1"/"s2" since they are already there. 2. It would be nice to put the comment about why you do this before the following code. Resource u1 = s1.getResourceUsage(); Resource u2 = s2.getResourceUsage();
        Hide
        zsl2007 zhangshilong added a comment - - edited

        ContainerAllocated picture means container allocation per minute.
        After patch, Container allocation per minute improves about 50%.
        obviously, 500 apps finish faster after patch.

        Show
        zsl2007 zhangshilong added a comment - - edited ContainerAllocated picture means container allocation per minute. After patch, Container allocation per minute improves about 50%. obviously, 500 apps finish faster after patch.
        Hide
        zsl2007 zhangshilong added a comment -

        Test case: 500 app,3000 nm nodes
        queue:
        parent queue number: 100
        leaf queue number per parent queue: 5
        500 apps submitted to 155 leaf queues. Average queue contains 4 apps.
        all apps are mapreduce job. One job contains 325 mapper and 44 reducer. Every mapper/reducer does: sleep 20 seconds.

        Show
        zsl2007 zhangshilong added a comment - Test case: 500 app,3000 nm nodes queue: parent queue number: 100 leaf queue number per parent queue: 5 500 apps submitted to 155 leaf queues. Average queue contains 4 apps. all apps are mapreduce job. One job contains 325 mapper and 44 reducer. Every mapper/reducer does: sleep 20 seconds.
        Hide
        yufeigu Yufei Gu added a comment -

        Thanks zhangshilong for reporting this issue and providing patch. Can you add performance comparison before and after your patch?

        Can any committer/administrator add zhangshilong as a contributor? Thanks. cc Karthik Kambatla, Daniel Templeton.

        Show
        yufeigu Yufei Gu added a comment - Thanks zhangshilong for reporting this issue and providing patch. Can you add performance comparison before and after your patch? Can any committer/administrator add zhangshilong as a contributor? Thanks. cc Karthik Kambatla , Daniel Templeton .

          People

          • Assignee:
            zsl2007 zhangshilong
            Reporter:
            zsl2007 zhangshilong
          • Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development