Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6895

[FairScheduler] Preemption reservation may cause regular reservation leaks

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha4
    • Fix Version/s: 2.9.0, 3.0.0-beta1
    • Component/s: fairscheduler
    • Labels:
      None

      Description

      We found a limitation in the implementation of YARN-6432. If the container released is smaller than the preemption request, a node reservation is created that is never deleted.

      1. YARN-6895.000.patch
        13 kB
        Miklos Szegedi
      2. YARN-6895.001.patch
        13 kB
        Miklos Szegedi
      3. YARN-6895.branch-2.000.patch
        13 kB
        Miklos Szegedi
      4. YARN-6895.branch-2.001.patch
        13 kB
        Miklos Szegedi

        Issue Links

          Activity

          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 13m 43s trunk passed
          +1 compile 0m 33s trunk passed
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 35s trunk passed
          +1 findbugs 1m 2s trunk passed
          +1 javadoc 0m 22s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 34s the patch passed
          +1 compile 0m 32s the patch passed
          +1 javac 0m 32s the patch passed
          +1 checkstyle 0m 23s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 39 unchanged - 4 fixed = 39 total (was 43)
          +1 mvnsite 0m 33s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 10s the patch passed
          +1 javadoc 0m 17s the patch passed
                Other Tests
          -1 unit 45m 54s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          67m 49s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation
          Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue YARN-6895
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879397/YARN-6895.000.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5de1815a58cb 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 77791e4
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          unit https://builds.apache.org/job/PreCommit-YARN-Build/16594/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16594/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/16594/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests +1 mvninstall 13m 43s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 35s trunk passed +1 findbugs 1m 2s trunk passed +1 javadoc 0m 22s trunk passed       Patch Compile Tests +1 mvninstall 0m 34s the patch passed +1 compile 0m 32s the patch passed +1 javac 0m 32s the patch passed +1 checkstyle 0m 23s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 39 unchanged - 4 fixed = 39 total (was 43) +1 mvnsite 0m 33s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 10s the patch passed +1 javadoc 0m 17s the patch passed       Other Tests -1 unit 45m 54s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 67m 49s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6895 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879397/YARN-6895.000.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5de1815a58cb 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 77791e4 Default Java 1.8.0_131 findbugs v3.1.0-RC1 unit https://builds.apache.org/job/PreCommit-YARN-Build/16594/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16594/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16594/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          I verified and TestSubmitApplicationWithRMHA fails without the patch as well. I could not repro the TestFSAppStarvation issue.

          Show
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - I verified and TestSubmitApplicationWithRMHA fails without the patch as well. I could not repro the TestFSAppStarvation issue.
          Hide
          yufeigu Yufei Gu added a comment -

          Thanks Miklos Szegedi for the patch. One question, if a node without any preemption reservation release some resources smaller than preemption resource request, scheduler still does the normal reservation?

          I was wondering would be easier and cleaner if we put resourcesPreemptedForApp, appIdToAppMap and totalResourcesPreempted into one single class? That case, we may get rid of appIdToAppMap and totalResourcesPreempted as well, and handle lock nicely.

          Some nits:

          • Need to expand this line import static org.junit.Assert.*;
          • Extra space on this line return resourcesPreemptedForApp.containsKey(app);
          • Comment "Reserve only, if not reserved for preempted resources," seems confusing to me, can you rewrite this comment block?
          Show
          yufeigu Yufei Gu added a comment - Thanks Miklos Szegedi for the patch. One question, if a node without any preemption reservation release some resources smaller than preemption resource request, scheduler still does the normal reservation? I was wondering would be easier and cleaner if we put resourcesPreemptedForApp , appIdToAppMap and totalResourcesPreempted into one single class? That case, we may get rid of appIdToAppMap and totalResourcesPreempted as well, and handle lock nicely. Some nits: Need to expand this line import static org.junit.Assert.*; Extra space on this line return resourcesPreemptedForApp.containsKey(app); Comment "Reserve only, if not reserved for preempted resources," seems confusing to me, can you rewrite this comment block?
          Hide
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          Thank you, Yufei Gu.
          We will do normal reservation, if there are no active preemptions on the node for the app. Does this answer your question? There is still reservation on other nodes if we preempt on one node but that should not be the cause of this regression, since that logic has been around before YARN-6432.

              // The desired container won't fit here, so reserve
              // Reserve only, if not reserved for preempted resources, otherwise
              // we may end up with duplicate reservations
              if (isReservable(capability) &&
                  !node.isPreemptedForApp(this) &&
                  reserve(pendingAsk.getPerAllocationResource(), node, reservedContainer,
                      type, schedulerKey)) {
          

          I had a patch with a single class implementation but it was rejected by the reviewers. I think we can revisit but I would not add too many changes to this Jira for simplicity.

          Show
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - Thank you, Yufei Gu . We will do normal reservation, if there are no active preemptions on the node for the app. Does this answer your question? There is still reservation on other nodes if we preempt on one node but that should not be the cause of this regression, since that logic has been around before YARN-6432 . // The desired container won't fit here, so reserve // Reserve only, if not reserved for preempted resources, otherwise // we may end up with duplicate reservations if (isReservable(capability) && !node.isPreemptedForApp( this ) && reserve(pendingAsk.getPerAllocationResource(), node, reservedContainer, type, schedulerKey)) { I had a patch with a single class implementation but it was rejected by the reviewers. I think we can revisit but I would not add too many changes to this Jira for simplicity.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 16m 32s trunk passed
          +1 compile 0m 34s trunk passed
          +1 checkstyle 0m 25s trunk passed
          +1 mvnsite 0m 35s trunk passed
          +1 findbugs 1m 2s trunk passed
          +1 javadoc 0m 21s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 32s the patch passed
          +1 compile 0m 30s the patch passed
          +1 javac 0m 30s the patch passed
          +1 checkstyle 0m 24s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 39 unchanged - 4 fixed = 39 total (was 43)
          +1 mvnsite 0m 33s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 7s the patch passed
          +1 javadoc 0m 17s the patch passed
                Other Tests
          -1 unit 47m 28s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 13s The patch does not generate ASF License warnings.
          72m 10s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector
          Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue YARN-6895
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879915/YARN-6895.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 4877068cd193 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 778d4ed
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          unit https://builds.apache.org/job/PreCommit-YARN-Build/16654/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16654/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/16654/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests +1 mvninstall 16m 32s trunk passed +1 compile 0m 34s trunk passed +1 checkstyle 0m 25s trunk passed +1 mvnsite 0m 35s trunk passed +1 findbugs 1m 2s trunk passed +1 javadoc 0m 21s trunk passed       Patch Compile Tests +1 mvninstall 0m 32s the patch passed +1 compile 0m 30s the patch passed +1 javac 0m 30s the patch passed +1 checkstyle 0m 24s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 39 unchanged - 4 fixed = 39 total (was 43) +1 mvnsite 0m 33s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 7s the patch passed +1 javadoc 0m 17s the patch passed       Other Tests -1 unit 47m 28s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 13s The patch does not generate ASF License warnings. 72m 10s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6895 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879915/YARN-6895.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4877068cd193 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 778d4ed Default Java 1.8.0_131 findbugs v3.1.0-RC1 unit https://builds.apache.org/job/PreCommit-YARN-Build/16654/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16654/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16654/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          The failing unit tests are not related to the change.

          Show
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - The failing unit tests are not related to the change.
          Hide
          yufeigu Yufei Gu added a comment -

          Can you create followup JIRAs for my question and suggestion? Otherwise looks good to me.

          Show
          yufeigu Yufei Gu added a comment - Can you create followup JIRAs for my question and suggestion? Otherwise looks good to me.
          Hide
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          I opened YARN-6925 and YARN-6926. Thank you!

          Show
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - I opened YARN-6925 and YARN-6926 . Thank you!
          Hide
          yufeigu Yufei Gu added a comment -

          +1. Thanks for the patch, Miklos Szegedi. Committed to trunk. It doesn't apply to branch-2. Can you rebase it to branch-2?

          Show
          yufeigu Yufei Gu added a comment - +1. Thanks for the patch, Miklos Szegedi . Committed to trunk. It doesn't apply to branch-2. Can you rebase it to branch-2?
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12104 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12104/)
          YARN-6895. [FairScheduler] Preemption reservation may cause regular (yufei: rev 45535f8afae4e5bf4f60597fc29ba94b4e7743f3)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerNode.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12104 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12104/ ) YARN-6895 . [FairScheduler] Preemption reservation may cause regular (yufei: rev 45535f8afae4e5bf4f60597fc29ba94b4e7743f3) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerNode.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          Hide
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          Thank you. I attached the branch-2 patch

          Show
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - Thank you. I attached the branch-2 patch
          Hide
          yufeigu Yufei Gu added a comment -

          Branch-2 patch compilation error:

          TestFSSchedulerNode.java:[72,33] incompatible types: java.lang.Object cannot be converted to org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer
          
          Show
          yufeigu Yufei Gu added a comment - Branch-2 patch compilation error: TestFSSchedulerNode.java:[72,33] incompatible types: java.lang. Object cannot be converted to org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer
          Hide
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          Attached a new patch.

          Show
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - Attached a new patch.
          Hide
          yufeigu Yufei Gu added a comment -

          Thanks for the patch, Miklos Szegedi. Committed to branch-2.

          Show
          yufeigu Yufei Gu added a comment - Thanks for the patch, Miklos Szegedi . Committed to branch-2.

            People

            • Assignee:
              miklos.szegedi@cloudera.com Miklos Szegedi
              Reporter:
              miklos.szegedi@cloudera.com Miklos Szegedi
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development