Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4865

Track Reserved resources in ResourceUsage and QueueCapacities

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.2
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      As discussed in YARN-4678, capture reserved capacity separately in QueueCapcities for better tracking.

      1. 0003-YARN-4865-addendum.patch
        8 kB
        Sunil G
      2. 0003-YARN-4865.patch
        20 kB
        Sunil G
      3. 0002-YARN-4865.patch
        20 kB
        Sunil G
      4. 0001-YARN-4865.patch
        15 kB
        Sunil G

        Activity

        Hide
        sunilg Sunil G added a comment -

        Uploading an initial version of patch. Wangda Tan could you please help to check the same.

        Show
        sunilg Sunil G added a comment - Uploading an initial version of patch. Wangda Tan could you please help to check the same.
        Hide
        leftnoteasy Wangda Tan added a comment -

        1. CSQueue: getReservedCapacity is not used by anyone, could be removed.

        2. FiCaSchedulerApp:
        Shouldn't call

        this.queue.decReservedResource(node.getPartition(), resource);
        

        Inside FiCaSchedulerApp, because SchedulerApplicationAttempt shouldn't modify queue's state directly.
        Instead you can do at LeafQueue#completedContainer:

                // Book-keeping
                if (removed) {
                  
                  // Inform the ordering policy
                  orderingPolicy.containerReleased(application, rmContainer);
                  
                  releaseResource(clusterResource, application, container.getResource(),
                      node.getPartition(), rmContainer, false);
                }
        

        Similarily, instead of call queue.incReservedResource in SchedulerApplicationAttempt, we should do this in LeafQueue

        3. We need to add sufficient tests for this, since reserved resource will be used by many downstream modules like UI/REST-API and scheduler internally to make decisions, following candidates to look at:

        • Simple container reserve / simple unreserve / reserved-and-container-completed (like preempted) / reserved-and-application-completed (Could be added to TestContainerAllocation)
        • Also it will be helpful to add simple tests for container reserve / unreserve for node partitions. (Could be added to TestContainerAllocationForNodeLabel)
        • Some tests could be merged or not necessary needed.
        Show
        leftnoteasy Wangda Tan added a comment - 1. CSQueue: getReservedCapacity is not used by anyone, could be removed. 2. FiCaSchedulerApp: Shouldn't call this .queue.decReservedResource(node.getPartition(), resource); Inside FiCaSchedulerApp, because SchedulerApplicationAttempt shouldn't modify queue's state directly. Instead you can do at LeafQueue#completedContainer: // Book-keeping if (removed) { // Inform the ordering policy orderingPolicy.containerReleased(application, rmContainer); releaseResource(clusterResource, application, container.getResource(), node.getPartition(), rmContainer, false ); } Similarily, instead of call queue.incReservedResource in SchedulerApplicationAttempt, we should do this in LeafQueue 3. We need to add sufficient tests for this, since reserved resource will be used by many downstream modules like UI/REST-API and scheduler internally to make decisions, following candidates to look at: Simple container reserve / simple unreserve / reserved-and-container-completed (like preempted) / reserved-and-application-completed (Could be added to TestContainerAllocation) Also it will be helpful to add simple tests for container reserve / unreserve for node partitions. (Could be added to TestContainerAllocationForNodeLabel) Some tests could be merged or not necessary needed.
        Hide
        sunilg Sunil G added a comment -

        Thank you Wangda Tan for the comments. Sharing an updated patch.

        I think its better to call incReservedResource from LeafQueue#assignContainers than from allocateResource. Do you see its fine? Because recoverContainer also call allocateResource.

        Show
        sunilg Sunil G added a comment - Thank you Wangda Tan for the comments. Sharing an updated patch. I think its better to call incReservedResource from LeafQueue#assignContainers than from allocateResource . Do you see its fine? Because recoverContainer also call allocateResource .
        Hide
        sunilg Sunil G added a comment -

        I think moveApp also has to be handled. I ll check and update the same.

        Show
        sunilg Sunil G added a comment - I think moveApp also has to be handled. I ll check and update the same.
        Hide
        sunilg Sunil G added a comment -

        I re-checked moveApplication and recoverContainer flows in detail. And it seems we are not handling reserved container in both scenarios. So current patch will do good..

        I am not very sure why are we not handling reservedContainer in move scenario. Am i missing something. Wangda Tan thoughts?

        Show
        sunilg Sunil G added a comment - I re-checked moveApplication and recoverContainer flows in detail. And it seems we are not handling reserved container in both scenarios. So current patch will do good.. I am not very sure why are we not handling reservedContainer in move scenario. Am i missing something. Wangda Tan thoughts?
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks Sunil G,

        Only one comment:
        1) Suggest to move following check:

            if(reservedRes == null || reservedRes.equals(Resources.none())){
              return;
            }
        

        Out of incReservedResource. Only call incReservedResource from assignContainers when necessary.

        Similarly, decReservedResource could be updated.

        Show
        leftnoteasy Wangda Tan added a comment - Thanks Sunil G , Only one comment: 1) Suggest to move following check: if (reservedRes == null || reservedRes.equals(Resources.none())){ return ; } Out of incReservedResource. Only call incReservedResource from assignContainers when necessary. Similarly, decReservedResource could be updated.
        Hide
        leftnoteasy Wangda Tan added a comment -

        I am not very sure why are we not handling reservedContainer in move scenario...

        I think you're correct, currently moveApplication doesn't include reserved containers. And previously reserved containers will be terminated after move.
        Per my understanding, the reasons are: cost of reserve container is low, and we may need to reconsider if we need to reserve containers or not after move.

        Show
        leftnoteasy Wangda Tan added a comment - I am not very sure why are we not handling reservedContainer in move scenario... I think you're correct, currently moveApplication doesn't include reserved containers. And previously reserved containers will be terminated after move. Per my understanding, the reasons are: cost of reserve container is low, and we may need to reconsider if we need to reserve containers or not after move.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
        +1 mvninstall 9m 52s trunk passed
        +1 compile 0m 49s trunk passed with JDK v1.8.0_74
        +1 compile 0m 37s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 23s trunk passed
        +1 mvnsite 0m 42s trunk passed
        +1 mvneclipse 0m 17s trunk passed
        +1 findbugs 1m 27s trunk passed
        +1 javadoc 0m 37s trunk passed with JDK v1.8.0_74
        +1 javadoc 0m 33s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 39s the patch passed
        +1 compile 0m 53s the patch passed with JDK v1.8.0_74
        +1 javac 0m 53s the patch passed
        +1 compile 0m 35s the patch passed with JDK v1.7.0_95
        +1 javac 0m 35s the patch passed
        -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 4 new + 211 unchanged - 0 fixed = 215 total (was 211)
        +1 mvnsite 0m 39s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 39s the patch passed
        +1 javadoc 0m 37s the patch passed with JDK v1.8.0_74
        +1 javadoc 0m 33s the patch passed with JDK v1.7.0_95
        -1 unit 70m 48s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74.
        -1 unit 67m 20s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95.
        -1 asflicense 0m 19s Patch generated 1 ASF License warnings.
        161m 24s



        Reason Tests
        JDK v1.8.0_74 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
          hadoop.yarn.server.resourcemanager.TestAMAuthorization
        JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
          hadoop.yarn.server.resourcemanager.TestAMAuthorization



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12795371/0002-YARN-4865.patch
        JIRA Issue YARN-4865
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 0bc2bb015042 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 55ae143
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
        unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10889/testReport/
        asflicense https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/10889/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 9m 52s trunk passed +1 compile 0m 49s trunk passed with JDK v1.8.0_74 +1 compile 0m 37s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 42s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 1m 27s trunk passed +1 javadoc 0m 37s trunk passed with JDK v1.8.0_74 +1 javadoc 0m 33s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 39s the patch passed +1 compile 0m 53s the patch passed with JDK v1.8.0_74 +1 javac 0m 53s the patch passed +1 compile 0m 35s the patch passed with JDK v1.7.0_95 +1 javac 0m 35s the patch passed -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 4 new + 211 unchanged - 0 fixed = 215 total (was 211) +1 mvnsite 0m 39s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 39s the patch passed +1 javadoc 0m 37s the patch passed with JDK v1.8.0_74 +1 javadoc 0m 33s the patch passed with JDK v1.7.0_95 -1 unit 70m 48s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. -1 unit 67m 20s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. -1 asflicense 0m 19s Patch generated 1 ASF License warnings. 161m 24s Reason Tests JDK v1.8.0_74 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12795371/0002-YARN-4865.patch JIRA Issue YARN-4865 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0bc2bb015042 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 55ae143 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10889/testReport/ asflicense https://builds.apache.org/job/PreCommit-YARN-Build/10889/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/10889/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        sunilg Sunil G added a comment -

        Thanks Wangda Tan for the comments.

        Yes, I can pull those null checks out for better readability. Updating a patch for same.

        cost for reserve container is low.

        Yes, I agree to this point very much. But when we do an app move, may be target queue might be not be lowest used queues, and there for the moved app may have to wait.

        So if in case there were some container reserved for this app, after move it may not get that very fast (may need to settle for rack local). But its not a core/major issue, rather I think its nice to have enhancement if supported.

        Show
        sunilg Sunil G added a comment - Thanks Wangda Tan for the comments. Yes, I can pull those null checks out for better readability. Updating a patch for same. cost for reserve container is low. Yes, I agree to this point very much. But when we do an app move, may be target queue might be not be lowest used queues, and there for the moved app may have to wait. So if in case there were some container reserved for this app, after move it may not get that very fast (may need to settle for rack local). But its not a core/major issue, rather I think its nice to have enhancement if supported.
        Hide
        sunilg Sunil G added a comment -

        Patch generated 1 ASF License warnings.

        Its not related to this patch. I will wait for new jenkins report, and if sill coming, we need to see why its coming from an hdfs file.

        Show
        sunilg Sunil G added a comment - Patch generated 1 ASF License warnings. Its not related to this patch. I will wait for new jenkins report, and if sill coming, we need to see why its coming from an hdfs file.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 9s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
        +1 mvninstall 6m 50s trunk passed
        +1 compile 0m 27s trunk passed with JDK v1.8.0_74
        +1 compile 0m 30s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 20s trunk passed
        +1 mvnsite 0m 36s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 1m 8s trunk passed
        +1 javadoc 0m 21s trunk passed with JDK v1.8.0_74
        +1 javadoc 0m 26s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 31s the patch passed
        +1 compile 0m 24s the patch passed with JDK v1.8.0_74
        +1 javac 0m 24s the patch passed
        +1 compile 0m 27s the patch passed with JDK v1.7.0_95
        +1 javac 0m 27s the patch passed
        -1 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 4 new + 212 unchanged - 0 fixed = 216 total (was 212)
        +1 mvnsite 0m 31s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 15s the patch passed
        +1 javadoc 0m 18s the patch passed with JDK v1.8.0_74
        +1 javadoc 0m 24s the patch passed with JDK v1.7.0_95
        -1 unit 60m 33s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74.
        -1 unit 61m 59s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95.
        -1 asflicense 0m 18s Patch generated 1 ASF License warnings.
        139m 10s



        Reason Tests
        JDK v1.8.0_74 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
          hadoop.yarn.server.resourcemanager.TestAMAuthorization
        JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
          hadoop.yarn.server.resourcemanager.TestAMAuthorization



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12795614/0003-YARN-4865.patch
        JIRA Issue YARN-4865
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux beb7b8a25340 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 90fcb16
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
        unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10892/testReport/
        asflicense https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/10892/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 9s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 50s trunk passed +1 compile 0m 27s trunk passed with JDK v1.8.0_74 +1 compile 0m 30s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 36s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 8s trunk passed +1 javadoc 0m 21s trunk passed with JDK v1.8.0_74 +1 javadoc 0m 26s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 31s the patch passed +1 compile 0m 24s the patch passed with JDK v1.8.0_74 +1 javac 0m 24s the patch passed +1 compile 0m 27s the patch passed with JDK v1.7.0_95 +1 javac 0m 27s the patch passed -1 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 4 new + 212 unchanged - 0 fixed = 216 total (was 212) +1 mvnsite 0m 31s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 15s the patch passed +1 javadoc 0m 18s the patch passed with JDK v1.8.0_74 +1 javadoc 0m 24s the patch passed with JDK v1.7.0_95 -1 unit 60m 33s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. -1 unit 61m 59s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. -1 asflicense 0m 18s Patch generated 1 ASF License warnings. 139m 10s Reason Tests JDK v1.8.0_74 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12795614/0003-YARN-4865.patch JIRA Issue YARN-4865 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux beb7b8a25340 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 90fcb16 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10892/testReport/ asflicense https://builds.apache.org/job/PreCommit-YARN-Build/10892/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/10892/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        leftnoteasy Wangda Tan added a comment -

        So if in case there were some container reserved for this app, after move it may not get that very fast (may need to settle for rack local). But its not a core/major issue, rather I think its nice to have enhancement if supported.

        I think it's better to let scheduler decide if a container can be reserved or not after move. In above example, if a app move to a queue, but the app/queue have lower priority, we should let it wait to reserve containers.

        Show
        leftnoteasy Wangda Tan added a comment - So if in case there were some container reserved for this app, after move it may not get that very fast (may need to settle for rack local). But its not a core/major issue, rather I think its nice to have enhancement if supported. I think it's better to let scheduler decide if a container can be reserved or not after move. In above example, if a app move to a queue, but the app/queue have lower priority, we should let it wait to reserve containers.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Latest patch looks good to me,

        Thanks Sunil G.

        Show
        leftnoteasy Wangda Tan added a comment - Latest patch looks good to me, Thanks Sunil G .
        Hide
        sunilg Sunil G added a comment -

        I think it's better to let scheduler decide if a container can be reserved or not after move

        I feel this make sense. It can wait as per CS for now bcz app/queue has lower priority

        Show
        sunilg Sunil G added a comment - I think it's better to let scheduler decide if a container can be reserved or not after move I feel this make sense. It can wait as per CS for now bcz app/queue has lower priority
        Hide
        leftnoteasy Wangda Tan added a comment -

        Committed to trunk/branch-2, thanks Sunil G!

        Show
        leftnoteasy Wangda Tan added a comment - Committed to trunk/branch-2, thanks Sunil G !
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9523 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9523/)
        YARN-4865. Track Reserved resources in ResourceUsage and (wangda: rev fc055a3cbe9545cf1c59421641c7b296aa33f953)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueCapacities.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Queue.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/QueueCapacities.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9523 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9523/ ) YARN-4865 . Track Reserved resources in ResourceUsage and (wangda: rev fc055a3cbe9545cf1c59421641c7b296aa33f953) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueCapacities.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Queue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/QueueCapacities.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
        Hide
        sunilg Sunil G added a comment -

        Thank you Wangda Tan for the review and commit!

        Show
        sunilg Sunil G added a comment - Thank you Wangda Tan for the review and commit!
        Hide
        leftnoteasy Wangda Tan added a comment - - edited

        Sunil G,

        It seems this patch needs one more fix:

        diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        index 9a74c22..df57787 100644
        --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        @@ -1322,14 +1322,6 @@ public void completedContainer(Resource clusterResource,
        
                 // Book-keeping
                 if (removed) {
        -
        -          // track reserved resource for metrics, for normal container
        -          // getReservedResource will be null.
        -          Resource reservedRes = rmContainer.getReservedResource();
        -          if (reservedRes != null && !reservedRes.equals(Resources.none())) {
        -            decReservedResource(node.getPartition(), reservedRes);
        -          }
        -
                   // Inform the ordering policy
                   orderingPolicy.containerReleased(application, rmContainer);
        
        diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
        index cf1b3e0..558fc53 100644
        --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
        +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
        @@ -247,6 +247,8 @@ public synchronized boolean unreserve(Priority priority,
               // Update reserved metrics
               queue.getMetrics().unreserveResource(getUser(),
                   rmContainer.getReservedResource());
        +
        +      queue.decReservedResource(node.getPartition(), rmContainer.getReservedResource());
               return true;
             }
             return false;
        

        We need above change to make sure allocation from reserved container will correctly deduct reserved resource. Sunil G, could you add few tests also?

        And some other cases in my mind that we need to consider:

        • Nodes lost / disconnected, we need to deduct reserved resources on such nodes. (I think it should covered by completedContainer code path)

        Above can be addressed in a separate JIRA.

        (Thanks Karam Singh reporting this issue)

        Show
        leftnoteasy Wangda Tan added a comment - - edited Sunil G , It seems this patch needs one more fix: diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java index 9a74c22..df57787 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java @@ -1322,14 +1322,6 @@ public void completedContainer(Resource clusterResource, // Book-keeping if (removed) { - - // track reserved resource for metrics, for normal container - // getReservedResource will be null . - Resource reservedRes = rmContainer.getReservedResource(); - if (reservedRes != null && !reservedRes.equals(Resources.none())) { - decReservedResource(node.getPartition(), reservedRes); - } - // Inform the ordering policy orderingPolicy.containerReleased(application, rmContainer); diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java index cf1b3e0..558fc53 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java @@ -247,6 +247,8 @@ public synchronized boolean unreserve(Priority priority, // Update reserved metrics queue.getMetrics().unreserveResource(getUser(), rmContainer.getReservedResource()); + + queue.decReservedResource(node.getPartition(), rmContainer.getReservedResource()); return true ; } return false ; We need above change to make sure allocation from reserved container will correctly deduct reserved resource. Sunil G , could you add few tests also? And some other cases in my mind that we need to consider: Nodes lost / disconnected, we need to deduct reserved resources on such nodes. (I think it should covered by completedContainer code path) Above can be addressed in a separate JIRA. (Thanks Karam Singh reporting this issue)
        Hide
        sunilg Sunil G added a comment -

        Thanks Wangda Tan. I will add some more tests with this suggested change.

        Show
        sunilg Sunil G added a comment - Thanks Wangda Tan . I will add some more tests with this suggested change.
        Hide
        sunilg Sunil G added a comment -

        Thank u Wangda Tan and Karam Singh

        Attached new patch. Test case will cover below scenario.

        • One container is reserved for app2 in node1
        • Killed a running container of app1, thus making enough space in node1 for app2 container.
        • Reserved container became allocated. Verified the new metrics against the same.

        Pls suggest if this is fine or not. I will raise another ticket to handle cases like node removed etc.

        Show
        sunilg Sunil G added a comment - Thank u Wangda Tan and Karam Singh Attached new patch. Test case will cover below scenario. One container is reserved for app2 in node1 Killed a running container of app1, thus making enough space in node1 for app2 container. Reserved container became allocated. Verified the new metrics against the same. Pls suggest if this is fine or not. I will raise another ticket to handle cases like node removed etc.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks Sunil G,

        Could you put the patch to a new JIRA? Addendum patch sometimes causes backporting issue. I think node removed should be handled by normal container completion path already. Add a simple text will be enough.

        Show
        leftnoteasy Wangda Tan added a comment - Thanks Sunil G , Could you put the patch to a new JIRA? Addendum patch sometimes causes backporting issue. I think node removed should be handled by normal container completion path already. Add a simple text will be enough.
        Hide
        sunilg Sunil G added a comment -

        Thanks Wangda Tan
        I will attach this patch in a new ticket. Also will include test case for node lost/removed etc.

        Show
        sunilg Sunil G added a comment - Thanks Wangda Tan I will attach this patch in a new ticket. Also will include test case for node lost/removed etc.
        Hide
        eepayne Eric Payne added a comment -

        I am backporting this to branch-2.8. It is a require prereq for YARN-4945, intra-queue preemption.

        Show
        eepayne Eric Payne added a comment - I am backporting this to branch-2.8. It is a require prereq for YARN-4945 , intra-queue preemption.

          People

          • Assignee:
            sunilg Sunil G
            Reporter:
            sunilg Sunil G
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development