Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6081

LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha2
    • Component/s: None
    • Labels:
      None

      Description

      While doing YARN-5864 tests, found an issue when a queue's reserved > pending. PreemptionResourceCalculator will preempt reserved container even if there's only one active queue in the cluster.

      To fix the problem, we need to deduct reserved from pending when getting total-pending resource for LeafQueue

      1. YARN-6081.001.patch
        27 kB
        Wangda Tan
      2. YARN-6081.002.patch
        31 kB
        Wangda Tan

        Activity

        Hide
        leftnoteasy Wangda Tan added a comment -

        This is the test case to reproduce the problem:

          @Test
          public void testPreemptionNotHappenForSingleReservedQueue() {
            Logger rootLogger = LogManager.getRootLogger();
            rootLogger.setLevel(Level.DEBUG);
        
            int[][] qData = new int[][]{
                //  /   A   B   C
                { 100, 40, 40, 20 },  // abs
                { 100, 100, 100, 100 },  // maxCap
                { 100,  70,  0,  0 },  // used
                {  10, 30,  0,  0 },  // pending
                {   0,  50,  0,  0 },  // reserved
                {   1,  1,  0,  0 },  // apps
                {  -1,  1,  1,  1 },  // req granularity
                {   3,  0,  0,  0 },  // subqueues
            };
            ProportionalCapacityPreemptionPolicy policy = buildPolicy(qData);
            policy.editSchedule();
            // ensure all pending rsrc from A get preempted from other queues
            verify(mDisp, times(0)).handle(argThat(new IsPreemptionRequestFor(appA)));
          }
        

        Please note that there's only one active queue. But preemption policy still preempt container from it.

        Show
        leftnoteasy Wangda Tan added a comment - This is the test case to reproduce the problem: @Test public void testPreemptionNotHappenForSingleReservedQueue() { Logger rootLogger = LogManager.getRootLogger(); rootLogger.setLevel(Level.DEBUG); int [][] qData = new int [][]{ // / A B C { 100, 40, 40, 20 }, // abs { 100, 100, 100, 100 }, // maxCap { 100, 70, 0, 0 }, // used { 10, 30, 0, 0 }, // pending { 0, 50, 0, 0 }, // reserved { 1, 1, 0, 0 }, // apps { -1, 1, 1, 1 }, // req granularity { 3, 0, 0, 0 }, // subqueues }; ProportionalCapacityPreemptionPolicy policy = buildPolicy(qData); policy.editSchedule(); // ensure all pending rsrc from A get preempted from other queues verify(mDisp, times(0)).handle(argThat( new IsPreemptionRequestFor(appA))); } Please note that there's only one active queue. But preemption policy still preempt container from it.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Attached ver.1 patch.

        Show
        leftnoteasy Wangda Tan added a comment - Attached ver.1 patch.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Sunil G, Eric Payne. Could you please review this fix? It will be better to be committed before YARN-5864.

        Show
        leftnoteasy Wangda Tan added a comment - Sunil G , Eric Payne . Could you please review this fix? It will be better to be committed before YARN-5864 .
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 14s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
        +1 mvninstall 15m 11s trunk passed
        +1 compile 0m 43s trunk passed
        +1 checkstyle 0m 35s trunk passed
        +1 mvnsite 0m 43s trunk passed
        +1 mvneclipse 0m 18s trunk passed
        +1 findbugs 1m 5s trunk passed
        +1 javadoc 0m 22s trunk passed
        +1 mvninstall 0m 34s the patch passed
        +1 compile 0m 35s the patch passed
        +1 javac 0m 35s the patch passed
        -0 checkstyle 0m 31s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 13 new + 888 unchanged - 3 fixed = 901 total (was 891)
        +1 mvnsite 0m 37s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 10s the patch passed
        +1 javadoc 0m 20s the patch passed
        +1 unit 40m 1s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        64m 48s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-6081
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846734/YARN-6081.001.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 25d2db55cd11 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 4db119b
        Default Java 1.8.0_111
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14634/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14634/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/14634/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 15m 11s trunk passed +1 compile 0m 43s trunk passed +1 checkstyle 0m 35s trunk passed +1 mvnsite 0m 43s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 1m 5s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 34s the patch passed +1 compile 0m 35s the patch passed +1 javac 0m 35s the patch passed -0 checkstyle 0m 31s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 13 new + 888 unchanged - 3 fixed = 901 total (was 891) +1 mvnsite 0m 37s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 10s the patch passed +1 javadoc 0m 20s the patch passed +1 unit 40m 1s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 64m 48s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6081 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846734/YARN-6081.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 25d2db55cd11 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4db119b Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14634/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14634/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14634/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        sunilg Sunil G added a comment -

        Thanks Wangda Tan. Good catch!

        We decrement pending resources only if container is allocated (not reserved). So ideally we have to deduct reserved memory from pending resource if any. Ideally makes sense for me.

        Few comments:
        1. getTotalPendingResourcesConsideringUserLimit, Not part of this patch. Could have a java doc comment there as well. So it ll be make javadoc also more better?
        2.

        	        Resource pending = app.getAppAttemptResourceUsage().getPending(
        	            partition);
        	        if (deductReservedFromPending) {
        	          pending = Resources.subtract(pending,
        	              app.getAppAttemptResourceUsage().getReserved(partition));
        	        }
        

        I have one doubt here. pending holds a reference of pending resource of appAttemptResource usage. Inside if(deductReservedFromPending) block, that reference is getting updated. Is that intentional?

        3.

        	        pending = Resources.max(resourceCalculator, lastClusterResource,
        	            pending, Resources.none());
        

        A quick doubt. Why are we using lastClusterResource here?

        4. testPreemptionNotHappenForSingleReservedQueue, comment near verify block is confusing.
        5. In testPendingResourcesConsideringUserLimit, could we also try to assert the app's pending and reserved too?

        Show
        sunilg Sunil G added a comment - Thanks Wangda Tan . Good catch! We decrement pending resources only if container is allocated (not reserved). So ideally we have to deduct reserved memory from pending resource if any. Ideally makes sense for me. Few comments: 1. getTotalPendingResourcesConsideringUserLimit , Not part of this patch. Could have a java doc comment there as well. So it ll be make javadoc also more better? 2. Resource pending = app.getAppAttemptResourceUsage().getPending( partition); if (deductReservedFromPending) { pending = Resources.subtract(pending, app.getAppAttemptResourceUsage().getReserved(partition)); } I have one doubt here. pending holds a reference of pending resource of appAttemptResource usage. Inside if(deductReservedFromPending) block, that reference is getting updated. Is that intentional? 3. pending = Resources.max(resourceCalculator, lastClusterResource, pending, Resources.none()); A quick doubt. Why are we using lastClusterResource here? 4. testPreemptionNotHappenForSingleReservedQueue , comment near verify block is confusing. 5. In testPendingResourcesConsideringUserLimit , could we also try to assert the app's pending and reserved too?
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks Sunil G for reviewing the patch.

        For 2), it uses Resources.substract so it will not touch the original value.
        For 3), updated to use componentwiseMax

        For 1/4/5, addressed.

        Show
        leftnoteasy Wangda Tan added a comment - Thanks Sunil G for reviewing the patch. For 2), it uses Resources.substract so it will not touch the original value. For 3), updated to use componentwiseMax For 1/4/5, addressed.
        Hide
        eepayne Eric Payne added a comment -

        Thanks Wangda Tan for fixing this. I am reviewing today. I will update later today or early tomorrow.

        Show
        eepayne Eric Payne added a comment - Thanks Wangda Tan for fixing this. I am reviewing today. I will update later today or early tomorrow.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 20s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
        +1 mvninstall 14m 49s trunk passed
        +1 compile 0m 37s trunk passed
        +1 checkstyle 0m 35s trunk passed
        +1 mvnsite 0m 38s trunk passed
        +1 mvneclipse 0m 19s trunk passed
        +1 findbugs 1m 10s trunk passed
        +1 javadoc 0m 23s trunk passed
        +1 mvninstall 0m 38s the patch passed
        +1 compile 0m 34s the patch passed
        +1 javac 0m 34s the patch passed
        -0 checkstyle 0m 30s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 17 new + 931 unchanged - 3 fixed = 948 total (was 934)
        +1 mvnsite 0m 36s the patch passed
        +1 mvneclipse 0m 15s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 16s the patch passed
        +1 javadoc 0m 21s the patch passed
        -1 unit 41m 52s hadoop-yarn-server-resourcemanager in the patch failed.
        +1 asflicense 0m 22s The patch does not generate ASF License warnings.
        66m 33s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-6081
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12847081/YARN-6081.002.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux f62055017949 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / e648b6e
        Default Java 1.8.0_111
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14641/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/14641/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14641/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/14641/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 14m 49s trunk passed +1 compile 0m 37s trunk passed +1 checkstyle 0m 35s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 1m 10s trunk passed +1 javadoc 0m 23s trunk passed +1 mvninstall 0m 38s the patch passed +1 compile 0m 34s the patch passed +1 javac 0m 34s the patch passed -0 checkstyle 0m 30s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 17 new + 931 unchanged - 3 fixed = 948 total (was 934) +1 mvnsite 0m 36s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 16s the patch passed +1 javadoc 0m 21s the patch passed -1 unit 41m 52s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 66m 33s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6081 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12847081/YARN-6081.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f62055017949 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e648b6e Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14641/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14641/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14641/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14641/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        eepayne Eric Payne added a comment -

        +1 LGTM. The failed test (TestRMRestart) is not related to this patch.

        Show
        eepayne Eric Payne added a comment - +1 LGTM. The failed test ( TestRMRestart ) is not related to this patch.
        Hide
        sunilg Sunil G added a comment - - edited

        Thanks Wangda Tan for the updated patch, and thanks Eric Payne for the review.

        +1 from my end also on latest patch. I will it commit later today.

        Show
        sunilg Sunil G added a comment - - edited Thanks Wangda Tan for the updated patch, and thanks Eric Payne for the review. +1 from my end also on latest patch. I will it commit later today.
        Hide
        sunilg Sunil G added a comment -

        Committed to trunk and branch-2.

        Thanks Wangda Tan for the patch and thanks Eric Payne for the additional review.

        Since 2.8 processing is happening currently, does this need to go to branch-2.8 ?

        Show
        sunilg Sunil G added a comment - Committed to trunk and branch-2. Thanks Wangda Tan for the patch and thanks Eric Payne for the additional review. Since 2.8 processing is happening currently, does this need to go to branch-2.8 ?
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks for review and commit! Sunil G, Eric Payne.

        Show
        leftnoteasy Wangda Tan added a comment - Thanks for review and commit! Sunil G , Eric Payne .

          People

          • Assignee:
            leftnoteasy Wangda Tan
            Reporter:
            leftnoteasy Wangda Tan
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development