Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6210

FS: Node reservations can interfere with preemption

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.9.0
    • 2.9.0, 3.0.0-alpha4
    • fairscheduler
    • None
    • Reviewed

    Description

      Today, on a saturated cluster, apps with pending demand reserve nodes. A new app might not be able to preempt resources because these nodes are already reserved. This can be reproduced by the example in YARN-6151.

      Since node reservations are to prevent starvation of apps requesting large containers, triggering these reservations only on starved applications would avoid this situation.

      Attachments

        1. YARN-6210.1.patch
          16 kB
          Karthik Kambatla
        2. YARN-6210.2.patch
          26 kB
          Karthik Kambatla
        3. YARN-6210.3.patch
          27 kB
          Karthik Kambatla
        4. YARN-6210.4.patch
          27 kB
          Karthik Kambatla

        Issue Links

          Activity

            Patch (v1) does the following:

            1. Only starved apps can reserve nodes
            2. The starvation check considers a single resource. For instance, in the case of DRF, this corresponds to the dominant resource: (1) the app is considered starved if it is starved on the dominant resource, (2) an app can be preempted as long as it is not starved on the dominant resource.
            3. FSAppAttempt#toString for easier debugging of tests
            4. DominantResourceFairnessComparator#compare can return 0 if the submit time is the same. This leads to issues while allocating containers. Changed this to be in line with FairShareComparator and return either -1 or 1.
            5. Added the test from YARN-6151. FYI. yufeigu
            kasha Karthik Kambatla added a comment - Patch (v1) does the following: Only starved apps can reserve nodes The starvation check considers a single resource. For instance, in the case of DRF, this corresponds to the dominant resource: (1) the app is considered starved if it is starved on the dominant resource, (2) an app can be preempted as long as it is not starved on the dominant resource. FSAppAttempt#toString for easier debugging of tests DominantResourceFairnessComparator#compare can return 0 if the submit time is the same. This leads to issues while allocating containers. Changed this to be in line with FairShareComparator and return either -1 or 1. Added the test from YARN-6151 . FYI. yufeigu
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 16s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
            0 mvndep 0m 46s Maven dependency ordering for branch
            +1 mvninstall 14m 17s trunk passed
            +1 compile 8m 35s trunk passed
            +1 checkstyle 0m 54s trunk passed
            +1 mvnsite 1m 22s trunk passed
            +1 mvneclipse 0m 42s trunk passed
            +1 findbugs 2m 38s trunk passed
            +1 javadoc 1m 6s trunk passed
            0 mvndep 0m 11s Maven dependency ordering for patch
            +1 mvninstall 1m 7s the patch passed
            +1 compile 6m 59s the patch passed
            +1 javac 6m 59s the patch passed
            +1 checkstyle 0m 59s the patch passed
            +1 mvnsite 1m 35s the patch passed
            +1 mvneclipse 0m 40s the patch passed
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 findbugs 3m 11s the patch passed
            -1 javadoc 0m 39s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common generated 13 new + 4575 unchanged - 0 fixed = 4588 total (was 4575)
            +1 unit 2m 57s hadoop-yarn-common in the patch passed.
            -1 unit 42m 34s hadoop-yarn-server-resourcemanager in the patch failed.
            +1 asflicense 0m 32s The patch does not generate ASF License warnings.
            101m 3s



            Reason Tests
            Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart
              hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
              hadoop.yarn.server.resourcemanager.TestRMAdminService



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:a9ad5d6
            JIRA Issue YARN-6210
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853639/YARN-6210.1.patch
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux 7273cca2149a 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / 8035749
            Default Java 1.8.0_121
            findbugs v3.0.0
            javadoc https://builds.apache.org/job/PreCommit-YARN-Build/15017/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
            unit https://builds.apache.org/job/PreCommit-YARN-Build/15017/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
            Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15017/testReport/
            modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
            Console output https://builds.apache.org/job/PreCommit-YARN-Build/15017/console
            Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 46s Maven dependency ordering for branch +1 mvninstall 14m 17s trunk passed +1 compile 8m 35s trunk passed +1 checkstyle 0m 54s trunk passed +1 mvnsite 1m 22s trunk passed +1 mvneclipse 0m 42s trunk passed +1 findbugs 2m 38s trunk passed +1 javadoc 1m 6s trunk passed 0 mvndep 0m 11s Maven dependency ordering for patch +1 mvninstall 1m 7s the patch passed +1 compile 6m 59s the patch passed +1 javac 6m 59s the patch passed +1 checkstyle 0m 59s the patch passed +1 mvnsite 1m 35s the patch passed +1 mvneclipse 0m 40s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 11s the patch passed -1 javadoc 0m 39s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common generated 13 new + 4575 unchanged - 0 fixed = 4588 total (was 4575) +1 unit 2m 57s hadoop-yarn-common in the patch passed. -1 unit 42m 34s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 32s The patch does not generate ASF License warnings. 101m 3s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler   hadoop.yarn.server.resourcemanager.TestRMAdminService Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6210 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853639/YARN-6210.1.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 7273cca2149a 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8035749 Default Java 1.8.0_121 findbugs v3.0.0 javadoc https://builds.apache.org/job/PreCommit-YARN-Build/15017/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/15017/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15017/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/15017/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

            Patch (v2) fixes:

            1. The javadoc warnings
            2. TestFairScheduler tests around reservation. It was very hard to understand the intent of failing tests. Updated the tests based on my understanding.
            kasha Karthik Kambatla added a comment - Patch (v2) fixes: The javadoc warnings TestFairScheduler tests around reservation. It was very hard to understand the intent of failing tests. Updated the tests based on my understanding.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 24s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
            0 mvndep 0m 8s Maven dependency ordering for branch
            +1 mvninstall 13m 5s trunk passed
            +1 compile 8m 9s trunk passed
            +1 checkstyle 0m 50s trunk passed
            +1 mvnsite 1m 14s trunk passed
            +1 mvneclipse 0m 39s trunk passed
            +1 findbugs 2m 19s trunk passed
            +1 javadoc 1m 0s trunk passed
            0 mvndep 0m 10s Maven dependency ordering for patch
            +1 mvninstall 0m 58s the patch passed
            +1 compile 6m 30s the patch passed
            +1 javac 6m 30s the patch passed
            +1 checkstyle 0m 49s the patch passed
            +1 mvnsite 1m 12s the patch passed
            +1 mvneclipse 0m 37s the patch passed
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 findbugs 2m 29s the patch passed
            +1 javadoc 0m 59s the patch passed
            +1 unit 2m 29s hadoop-yarn-common in the patch passed.
            -1 unit 41m 1s hadoop-yarn-server-resourcemanager in the patch failed.
            +1 asflicense 0m 30s The patch does not generate ASF License warnings.
            94m 4s



            Reason Tests
            Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart
              hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:a9ad5d6
            JIRA Issue YARN-6210
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853691/YARN-6210.2.patch
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux 9ff8da4ca724 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / 6ba61d2
            Default Java 1.8.0_121
            findbugs v3.0.0
            unit https://builds.apache.org/job/PreCommit-YARN-Build/15021/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
            Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15021/testReport/
            modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
            Console output https://builds.apache.org/job/PreCommit-YARN-Build/15021/console
            Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 24s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 8s Maven dependency ordering for branch +1 mvninstall 13m 5s trunk passed +1 compile 8m 9s trunk passed +1 checkstyle 0m 50s trunk passed +1 mvnsite 1m 14s trunk passed +1 mvneclipse 0m 39s trunk passed +1 findbugs 2m 19s trunk passed +1 javadoc 1m 0s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 0m 58s the patch passed +1 compile 6m 30s the patch passed +1 javac 6m 30s the patch passed +1 checkstyle 0m 49s the patch passed +1 mvnsite 1m 12s the patch passed +1 mvneclipse 0m 37s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 29s the patch passed +1 javadoc 0m 59s the patch passed +1 unit 2m 29s hadoop-yarn-common in the patch passed. -1 unit 41m 1s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 30s The patch does not generate ASF License warnings. 94m 4s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6210 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853691/YARN-6210.2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9ff8da4ca724 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6ba61d2 Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/15021/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15021/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/15021/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 12s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
            0 mvndep 0m 45s Maven dependency ordering for branch
            +1 mvninstall 12m 44s trunk passed
            +1 compile 7m 56s trunk passed
            +1 checkstyle 0m 49s trunk passed
            +1 mvnsite 1m 12s trunk passed
            +1 mvneclipse 0m 37s trunk passed
            +1 findbugs 2m 7s trunk passed
            +1 javadoc 1m 0s trunk passed
            0 mvndep 0m 9s Maven dependency ordering for patch
            +1 mvninstall 0m 56s the patch passed
            +1 compile 6m 3s the patch passed
            +1 javac 6m 3s the patch passed
            +1 checkstyle 0m 48s the patch passed
            +1 mvnsite 1m 9s the patch passed
            +1 mvneclipse 0m 36s the patch passed
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 findbugs 2m 19s the patch passed
            +1 javadoc 0m 57s the patch passed
            +1 unit 2m 24s hadoop-yarn-common in the patch passed.
            -1 unit 54m 55s hadoop-yarn-server-resourcemanager in the patch failed.
            +1 asflicense 0m 31s The patch does not generate ASF License warnings.
            106m 43s



            Reason Tests
            Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart
            Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:a9ad5d6
            JIRA Issue YARN-6210
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853691/YARN-6210.2.patch
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux d9dbfd9effa5 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / 6ba61d2
            Default Java 1.8.0_121
            findbugs v3.0.0
            unit https://builds.apache.org/job/PreCommit-YARN-Build/15022/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
            Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15022/testReport/
            modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
            Console output https://builds.apache.org/job/PreCommit-YARN-Build/15022/console
            Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 45s Maven dependency ordering for branch +1 mvninstall 12m 44s trunk passed +1 compile 7m 56s trunk passed +1 checkstyle 0m 49s trunk passed +1 mvnsite 1m 12s trunk passed +1 mvneclipse 0m 37s trunk passed +1 findbugs 2m 7s trunk passed +1 javadoc 1m 0s trunk passed 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 0m 56s the patch passed +1 compile 6m 3s the patch passed +1 javac 6m 3s the patch passed +1 checkstyle 0m 48s the patch passed +1 mvnsite 1m 9s the patch passed +1 mvneclipse 0m 36s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 19s the patch passed +1 javadoc 0m 57s the patch passed +1 unit 2m 24s hadoop-yarn-common in the patch passed. -1 unit 54m 55s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 31s The patch does not generate ASF License warnings. 106m 43s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6210 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853691/YARN-6210.2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d9dbfd9effa5 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6ba61d2 Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/15022/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15022/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/15022/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

            Thanks for the patch, kasha. A few comments:

            • ResourceCalculator.compare(): singleType needs to be explained better. I still don't know what it means.
            • TestFailSchedulerPreemption.writeAllocFile(): should preemptable maybe be preemptable-1?
            • FSAppAttempt.isStarvedForFairShare(): No likey the "0 > ..." There's no reason for the weird inverted order, and less than x is easier to logic about.
            • FSAppAttempt.isStarved(): looks like it can be private...
            • FSAppAttempt.fairShareStarvation(): the meaning of starved changed. It would be nice to document in a comment why it's OK that starved doesn't account for the threshold whereas the return value does.
            • TestFairScheduler.testReservationWhileMultiplePriorities(): The comment Create one application and take up all resources isn't actually true, is it?
            • TestFairScheduler.testReservationWhileMultiplePriorities(): Looks like you took out the tests that the scheduler resources are as expected at the various steps and the test that the reservation is still for the lower priority request. Not critical, but I would leave them in if it were me.
            • TestFairScheduler.testReservationsStrictLocality: I like messages in my asserts.
            templedf Daniel Templeton added a comment - Thanks for the patch, kasha . A few comments: ResourceCalculator.compare() : singleType needs to be explained better. I still don't know what it means. TestFailSchedulerPreemption.writeAllocFile() : should preemptable maybe be preemptable-1 ? FSAppAttempt.isStarvedForFairShare() : No likey the "0 > ..." There's no reason for the weird inverted order, and less than x is easier to logic about. FSAppAttempt.isStarved() : looks like it can be private... FSAppAttempt.fairShareStarvation() : the meaning of starved changed. It would be nice to document in a comment why it's OK that starved doesn't account for the threshold whereas the return value does. TestFairScheduler.testReservationWhileMultiplePriorities() : The comment Create one application and take up all resources isn't actually true, is it? TestFairScheduler.testReservationWhileMultiplePriorities() : Looks like you took out the tests that the scheduler resources are as expected at the various steps and the test that the reservation is still for the lower priority request. Not critical, but I would leave them in if it were me. TestFairScheduler.testReservationsStrictLocality : I like messages in my asserts.

            Thanks for the prompt review, Daniel.

            The updated patch incorporates all your suggestions but for the following:

            1. FSAppAttempt#fairShareStarvation(): I have reverted the meaning of starved. The reason for this is to ensure we don't mark an app whose demand is fully met, but the allocation is under its fairshare. The code also seems simpler and less confusing this way.
            2. TestFairScheduler.testReservationWithMultiplePriorities(): The reservation-at-lower-priority assert is retained. I did drop the asserts for scheduler resources, but retained the checks around running containers. Since the test is for verifying node reservation behavior, other asserts are misleading.
            kasha Karthik Kambatla added a comment - Thanks for the prompt review, Daniel. The updated patch incorporates all your suggestions but for the following: FSAppAttempt#fairShareStarvation() : I have reverted the meaning of starved . The reason for this is to ensure we don't mark an app whose demand is fully met, but the allocation is under its fairshare. The code also seems simpler and less confusing this way. TestFairScheduler.testReservationWithMultiplePriorities() : The reservation-at-lower-priority assert is retained. I did drop the asserts for scheduler resources, but retained the checks around running containers. Since the test is for verifying node reservation behavior, other asserts are misleading.
            hadoopqa Hadoop QA added a comment -
            +1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 10s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
            0 mvndep 0m 43s Maven dependency ordering for branch
            +1 mvninstall 12m 45s trunk passed
            +1 compile 8m 21s trunk passed
            +1 checkstyle 0m 50s trunk passed
            +1 mvnsite 1m 15s trunk passed
            +1 mvneclipse 0m 38s trunk passed
            +1 findbugs 2m 13s trunk passed
            +1 javadoc 1m 1s trunk passed
            0 mvndep 0m 10s Maven dependency ordering for patch
            +1 mvninstall 1m 3s the patch passed
            +1 compile 6m 51s the patch passed
            +1 javac 6m 51s the patch passed
            +1 checkstyle 0m 51s the patch passed
            +1 mvnsite 1m 20s the patch passed
            +1 mvneclipse 0m 39s the patch passed
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 findbugs 2m 39s the patch passed
            +1 javadoc 1m 2s the patch passed
            +1 unit 2m 32s hadoop-yarn-common in the patch passed.
            +1 unit 41m 40s hadoop-yarn-server-resourcemanager in the patch passed.
            +1 asflicense 0m 31s The patch does not generate ASF License warnings.
            95m 57s



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:a9ad5d6
            JIRA Issue YARN-6210
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853866/YARN-6210.3.patch
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux aea153c517c1 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / 003ae00
            Default Java 1.8.0_121
            findbugs v3.0.0
            Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15034/testReport/
            modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
            Console output https://builds.apache.org/job/PreCommit-YARN-Build/15034/console
            Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 43s Maven dependency ordering for branch +1 mvninstall 12m 45s trunk passed +1 compile 8m 21s trunk passed +1 checkstyle 0m 50s trunk passed +1 mvnsite 1m 15s trunk passed +1 mvneclipse 0m 38s trunk passed +1 findbugs 2m 13s trunk passed +1 javadoc 1m 1s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 1m 3s the patch passed +1 compile 6m 51s the patch passed +1 javac 6m 51s the patch passed +1 checkstyle 0m 51s the patch passed +1 mvnsite 1m 20s the patch passed +1 mvneclipse 0m 39s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 39s the patch passed +1 javadoc 1m 2s the patch passed +1 unit 2m 32s hadoop-yarn-common in the patch passed. +1 unit 41m 40s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 31s The patch does not generate ASF License warnings. 95m 57s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6210 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853866/YARN-6210.3.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux aea153c517c1 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 003ae00 Default Java 1.8.0_121 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15034/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/15034/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

            Looks good. A couple tiny nits:

            • FSAppAttempt.isStarved(Resource usage, Resource share): probably a bad name. It doesn't have anything to do with starving. It's a comparator.
            • The reservation-at-lower-priority assert is retained.

              Nope, the assignment-at-lower-priority assert is retained. The original test also had an assert to test the reservation itself:
               // Reserved container should still be at lower priority		
                  for (RMContainer container : app.getReservedContainers()) {		
                    assertEquals(2,		
                        container.getReservedSchedulerKey().getPriority().getPriority());		
                  }
            • On the assert messages, please keep in mind what the person triaging the failures will see, e.g. "Basic allocation failed expected:<1> but was:<0>." The message doesn't explain the numbers. Not a major issue, and not something I expect you to fix, but I wanted to point it out.
            templedf Daniel Templeton added a comment - Looks good. A couple tiny nits: FSAppAttempt.isStarved(Resource usage, Resource share) : probably a bad name. It doesn't have anything to do with starving. It's a comparator. The reservation-at-lower-priority assert is retained. Nope, the assignment-at-lower-priority assert is retained. The original test also had an assert to test the reservation itself: // Reserved container should still be at lower priority for (RMContainer container : app.getReservedContainers()) { assertEquals(2, container.getReservedSchedulerKey().getPriority().getPriority()); } On the assert messages, please keep in mind what the person triaging the failures will see, e.g. "Basic allocation failed expected:<1> but was:<0>." The message doesn't explain the numbers. Not a major issue, and not something I expect you to fix, but I wanted to point it out.

            FSAppAttempt.isStarved(Resource usage, Resource share): probably a bad name. It doesn't have anything to do with starving. It's a comparator.

            I agree it is not the best name, but I couldn't come up with a better name that will not be used incorrectly in the future. I am happy to accommodate any valid suggestions.

            Nope, the assignment-at-lower-priority assert is retained. The original test also had an assert to test the reservation itself:

            This test deals in fewer containers. There is a check for reservation succeeding and subsequently allocation succeeding. Since the behavior changed on when reservations are allowed, the app cannot reserve more than one container.

            kasha Karthik Kambatla added a comment - FSAppAttempt.isStarved(Resource usage, Resource share): probably a bad name. It doesn't have anything to do with starving. It's a comparator. I agree it is not the best name, but I couldn't come up with a better name that will not be used incorrectly in the future. I am happy to accommodate any valid suggestions. Nope, the assignment-at-lower-priority assert is retained. The original test also had an assert to test the reservation itself: This test deals in fewer containers. There is a check for reservation succeeding and subsequently allocation succeeding. Since the behavior changed on when reservations are allowed, the app cannot reserve more than one container.

            bq I am happy to accommodate any valid suggestions.

            How about isUsageBelowShare()?

            templedf Daniel Templeton added a comment - bq I am happy to accommodate any valid suggestions. How about isUsageBelowShare() ?

            That does sound better than isStarved. v4 renames the method.

            kasha Karthik Kambatla added a comment - That does sound better than isStarved. v4 renames the method.
            hadoopqa Hadoop QA added a comment -
            +1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 12s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
            0 mvndep 0m 43s Maven dependency ordering for branch
            +1 mvninstall 12m 35s trunk passed
            +1 compile 8m 43s trunk passed
            +1 checkstyle 0m 49s trunk passed
            +1 mvnsite 1m 13s trunk passed
            +1 mvneclipse 0m 39s trunk passed
            +1 findbugs 2m 5s trunk passed
            +1 javadoc 0m 59s trunk passed
            0 mvndep 0m 10s Maven dependency ordering for patch
            +1 mvninstall 0m 56s the patch passed
            +1 compile 6m 6s the patch passed
            +1 javac 6m 6s the patch passed
            +1 checkstyle 0m 47s the patch passed
            +1 mvnsite 1m 11s the patch passed
            +1 mvneclipse 0m 36s the patch passed
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 findbugs 2m 31s the patch passed
            +1 javadoc 1m 1s the patch passed
            +1 unit 2m 36s hadoop-yarn-common in the patch passed.
            +1 unit 40m 48s hadoop-yarn-server-resourcemanager in the patch passed.
            +1 asflicense 0m 31s The patch does not generate ASF License warnings.
            93m 44s



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:a9ad5d6
            JIRA Issue YARN-6210
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12854043/YARN-6210.4.patch
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux da34fbf21de6 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / 1a6ca75
            Default Java 1.8.0_121
            findbugs v3.0.0
            Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15047/testReport/
            modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
            Console output https://builds.apache.org/job/PreCommit-YARN-Build/15047/console
            Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 43s Maven dependency ordering for branch +1 mvninstall 12m 35s trunk passed +1 compile 8m 43s trunk passed +1 checkstyle 0m 49s trunk passed +1 mvnsite 1m 13s trunk passed +1 mvneclipse 0m 39s trunk passed +1 findbugs 2m 5s trunk passed +1 javadoc 0m 59s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 0m 56s the patch passed +1 compile 6m 6s the patch passed +1 javac 6m 6s the patch passed +1 checkstyle 0m 47s the patch passed +1 mvnsite 1m 11s the patch passed +1 mvneclipse 0m 36s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 31s the patch passed +1 javadoc 1m 1s the patch passed +1 unit 2m 36s hadoop-yarn-common in the patch passed. +1 unit 40m 48s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 31s The patch does not generate ASF License warnings. 93m 44s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6210 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12854043/YARN-6210.4.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux da34fbf21de6 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1a6ca75 Default Java 1.8.0_121 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15047/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/15047/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

            I got two +1s from Jenkins. templedf - don't you think that is enough to get a +1 from a committer too?

            kasha Karthik Kambatla added a comment - I got two +1s from Jenkins. templedf - don't you think that is enough to get a +1 from a committer too?
            templedf Daniel Templeton added a comment - - edited

            You're gonna love me for this one...

            We now have:

                if (!starved) {
                  lastTimeAtFairShare = now;
                }
            
                if (!starved ||
                    now - lastTimeAtFairShare < fsQueue.getFairSharePreemptionTimeout()) {
                  fairshareStarvation = Resources.none();
                } 
            

            What about:

                if (!starved ||
                    now - lastTimeAtFairShare < fsQueue.getFairSharePreemptionTimeout()) {
                  if (!starved) {
                    lastTimeAtFairShare = now;
                  }
            
                  fairshareStarvation = Resources.none();
                } 
            

            ?

            Just a thought. I'm a +1 either way.

            templedf Daniel Templeton added a comment - - edited You're gonna love me for this one... We now have: if (!starved) { lastTimeAtFairShare = now; } if (!starved || now - lastTimeAtFairShare < fsQueue.getFairSharePreemptionTimeout()) { fairshareStarvation = Resources.none(); } What about: if (!starved || now - lastTimeAtFairShare < fsQueue.getFairSharePreemptionTimeout()) { if (!starved) { lastTimeAtFairShare = now; } fairshareStarvation = Resources.none(); } ? Just a thought. I'm a +1 either way.

            I like what we have better. It is clearer; nested ifs are more confusing because one has to hold the conditions of the previous if in context. Also, I don't see it being more efficient or anything either.

            Since you are +1 either way, I ll go ahead and commit this.

            kasha Karthik Kambatla added a comment - I like what we have better. It is clearer; nested ifs are more confusing because one has to hold the conditions of the previous if in context. Also, I don't see it being more efficient or anything either. Since you are +1 either way, I ll go ahead and commit this.

            Just committed this to trunk and branch-2. Thanks templedf for the careful reviews.

            kasha Karthik Kambatla added a comment - Just committed this to trunk and branch-2. Thanks templedf for the careful reviews.
            hudson Hudson added a comment -

            SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11291 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11291/)
            YARN-6210. FairScheduler: Node reservations can interfere with (kasha: rev 718ad9f6ee93d4145f2bb19b7582ce4e1174feaf)

            • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
            • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
            • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceCalculator.java
            • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
            • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
            • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
            • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DefaultResourceCalculator.java
            • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java
            hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11291 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11291/ ) YARN-6210 . FairScheduler: Node reservations can interfere with (kasha: rev 718ad9f6ee93d4145f2bb19b7582ce4e1174feaf) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceCalculator.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DefaultResourceCalculator.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java

            People

              kasha Karthik Kambatla
              kasha Karthik Kambatla
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: