Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5773

RM recovery too slow due to LeafQueue#activateApplication()

    Details

    • Hadoop Flags:
      Reviewed

      Description

      1. Submit application 10K application to default queue.
      2. All applications are in accepted state
      3. Now restart resourcemanager

      For each application recovery LeafQueue#activateApplications() is invoked.Resulting in AM limit check to be done even before Node managers are getting registered.

      Total iteration for N application is about N(N+1)/2 for 10K application 50000000 iterations causing time take for Rm to be active more than 10 min.

      Since NM resources are not yet added to during recovery we should skip activateApplicaiton()

      1. YARN-5773-branch-2.8.0001.patch
        9 kB
        Bibin A Chundatt
      2. YARN-5773.0009.patch
        9 kB
        Bibin A Chundatt
      3. YARN-5773.0008.patch
        8 kB
        Varun Saxena
      4. YARN-5773.0007.patch
        7 kB
        Bibin A Chundatt
      5. YARN-5773.0006.patch
        4 kB
        Bibin A Chundatt
      6. YARN-5773.0005.patch
        4 kB
        Bibin A Chundatt
      7. YARN-5773.0004.patch
        4 kB
        Bibin A Chundatt
      8. YARN-5773.0002.patch
        7 kB
        Bibin A Chundatt
      9. YARN-5773.0001.patch
        7 kB
        Bibin A Chundatt

        Issue Links

          Activity

          Hide
          varun_saxena Varun Saxena added a comment -

          Committed to trunk, branch-2 and branch-2.8
          Thanks Bibin A Chundatt for your contribution and thanks Sunil G, Wangda Tan and Naganarasimha G R for reviews.

          Show
          varun_saxena Varun Saxena added a comment - Committed to trunk, branch-2 and branch-2.8 Thanks Bibin A Chundatt for your contribution and thanks Sunil G , Wangda Tan and Naganarasimha G R for reviews.
          Hide
          varun_saxena Varun Saxena added a comment -

          Thanks Bibin A Chundatt for providing branch-2.8 patch.
          Test failures reported above are known issues, unrelated to your patch.

          Will commit on branch-2.8 too.

          Show
          varun_saxena Varun Saxena added a comment - Thanks Bibin A Chundatt for providing branch-2.8 patch. Test failures reported above are known issues, unrelated to your patch. Will commit on branch-2.8 too.
          Hide
          bibinchundatt Bibin A Chundatt added a comment - - edited
          JDK v1.8.0_111 Failed junit tests 	hadoop.yarn.server.resourcemanager.TestClientRMTokens
            	hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_111 Failed junit tests 	hadoop.yarn.server.resourcemanager.TestClientRMTokens
            	hadoop.yarn.server.resourcemanager.TestAMAuthorization 
          

          IIUC these case are due to hostname resolution.

          Show
          bibinchundatt Bibin A Chundatt added a comment - - edited JDK v1.8.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens hadoop.yarn.server.resourcemanager.TestAMAuthorization IIUC these case are due to hostname resolution.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 17m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 6m 47s branch-2.8 passed
          +1 compile 0m 27s branch-2.8 passed with JDK v1.8.0_111
          +1 compile 0m 31s branch-2.8 passed with JDK v1.7.0_111
          +1 checkstyle 0m 18s branch-2.8 passed
          +1 mvnsite 0m 36s branch-2.8 passed
          +1 mvneclipse 0m 17s branch-2.8 passed
          +1 findbugs 1m 9s branch-2.8 passed
          +1 javadoc 0m 20s branch-2.8 passed with JDK v1.8.0_111
          +1 javadoc 0m 23s branch-2.8 passed with JDK v1.7.0_111
          +1 mvninstall 0m 29s the patch passed
          +1 compile 0m 24s the patch passed with JDK v1.8.0_111
          +1 javac 0m 24s the patch passed
          +1 compile 0m 28s the patch passed with JDK v1.7.0_111
          +1 javac 0m 28s the patch passed
          +1 checkstyle 0m 16s the patch passed
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 16s the patch passed
          +1 javadoc 0m 16s the patch passed with JDK v1.8.0_111
          +1 javadoc 0m 21s the patch passed with JDK v1.7.0_111
          -1 unit 74m 44s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_111.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          182m 41s



          Reason Tests
          JDK v1.8.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:5af2af1
          JIRA Issue YARN-5773
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835992/YARN-5773-branch-2.8.0001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux d16b4241562e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.8 / 9b6d277
          Default Java 1.7.0_111
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13670/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_111.txt
          JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13670/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13670/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 17m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 47s branch-2.8 passed +1 compile 0m 27s branch-2.8 passed with JDK v1.8.0_111 +1 compile 0m 31s branch-2.8 passed with JDK v1.7.0_111 +1 checkstyle 0m 18s branch-2.8 passed +1 mvnsite 0m 36s branch-2.8 passed +1 mvneclipse 0m 17s branch-2.8 passed +1 findbugs 1m 9s branch-2.8 passed +1 javadoc 0m 20s branch-2.8 passed with JDK v1.8.0_111 +1 javadoc 0m 23s branch-2.8 passed with JDK v1.7.0_111 +1 mvninstall 0m 29s the patch passed +1 compile 0m 24s the patch passed with JDK v1.8.0_111 +1 javac 0m 24s the patch passed +1 compile 0m 28s the patch passed with JDK v1.7.0_111 +1 javac 0m 28s the patch passed +1 checkstyle 0m 16s the patch passed +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 16s the patch passed +1 javadoc 0m 16s the patch passed with JDK v1.8.0_111 +1 javadoc 0m 21s the patch passed with JDK v1.7.0_111 -1 unit 74m 44s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_111. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 182m 41s Reason Tests JDK v1.8.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:5af2af1 JIRA Issue YARN-5773 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835992/YARN-5773-branch-2.8.0001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d16b4241562e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / 9b6d277 Default Java 1.7.0_111 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/13670/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_111.txt JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13670/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13670/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 6m 33s branch-2.8 passed
          +1 compile 0m 27s branch-2.8 passed with JDK v1.8.0_101
          +1 compile 0m 31s branch-2.8 passed with JDK v1.7.0_111
          +1 checkstyle 0m 20s branch-2.8 passed
          +1 mvnsite 0m 36s branch-2.8 passed
          +1 mvneclipse 0m 17s branch-2.8 passed
          +1 findbugs 1m 9s branch-2.8 passed
          +1 javadoc 0m 19s branch-2.8 passed with JDK v1.8.0_101
          +1 javadoc 0m 24s branch-2.8 passed with JDK v1.7.0_111
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 25s the patch passed with JDK v1.8.0_101
          +1 javac 0m 25s the patch passed
          +1 compile 0m 29s the patch passed with JDK v1.7.0_111
          +1 javac 0m 29s the patch passed
          +1 checkstyle 0m 17s the patch passed
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 17s the patch passed
          +1 javadoc 0m 17s the patch passed with JDK v1.8.0_101
          +1 javadoc 0m 20s the patch passed with JDK v1.7.0_111
          -1 unit 70m 40s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_111.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          157m 21s



          Reason Tests
          JDK v1.8.0_101 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestClientRMTokens
          JDK v1.7.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestClientRMTokens



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:5af2af1
          JIRA Issue YARN-5773
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835992/YARN-5773-branch-2.8.0001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 474cf9a43461 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.8 / 9b6d277
          Default Java 1.7.0_111
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13671/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_111.txt
          JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13671/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13671/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 33s branch-2.8 passed +1 compile 0m 27s branch-2.8 passed with JDK v1.8.0_101 +1 compile 0m 31s branch-2.8 passed with JDK v1.7.0_111 +1 checkstyle 0m 20s branch-2.8 passed +1 mvnsite 0m 36s branch-2.8 passed +1 mvneclipse 0m 17s branch-2.8 passed +1 findbugs 1m 9s branch-2.8 passed +1 javadoc 0m 19s branch-2.8 passed with JDK v1.8.0_101 +1 javadoc 0m 24s branch-2.8 passed with JDK v1.7.0_111 +1 mvninstall 0m 30s the patch passed +1 compile 0m 25s the patch passed with JDK v1.8.0_101 +1 javac 0m 25s the patch passed +1 compile 0m 29s the patch passed with JDK v1.7.0_111 +1 javac 0m 29s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 17s the patch passed +1 javadoc 0m 17s the patch passed with JDK v1.8.0_101 +1 javadoc 0m 20s the patch passed with JDK v1.7.0_111 -1 unit 70m 40s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_111. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 157m 21s Reason Tests JDK v1.8.0_101 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestClientRMTokens JDK v1.7.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestClientRMTokens Subsystem Report/Notes Docker Image:yetus/hadoop:5af2af1 JIRA Issue YARN-5773 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835992/YARN-5773-branch-2.8.0001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 474cf9a43461 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / 9b6d277 Default Java 1.7.0_111 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/13671/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_111.txt JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13671/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13671/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Attaching branch-2.8 patch

          Show
          bibinchundatt Bibin A Chundatt added a comment - Attaching branch-2.8 patch
          Hide
          varun_saxena Varun Saxena added a comment -

          Bibin A Chundatt, patch does not apply cleanly on branch-2.8
          Kindly update a patch for it.

          Show
          varun_saxena Varun Saxena added a comment - Bibin A Chundatt , patch does not apply cleanly on branch-2.8 Kindly update a patch for it.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10727 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10727/)
          YARN-5773. RM recovery too slow due to LeafQueue#activateApplications (varunsaxena: rev 1c8ab41e8b3477a93cbdf0b553a87b131eb60e1f)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAMContainerLaunchDiagnosticsConstants.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10727 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10727/ ) YARN-5773 . RM recovery too slow due to LeafQueue#activateApplications (varunsaxena: rev 1c8ab41e8b3477a93cbdf0b553a87b131eb60e1f) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAMContainerLaunchDiagnosticsConstants.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
          Hide
          varun_saxena Varun Saxena added a comment -

          +1.
          Will commit it shortly.

          Show
          varun_saxena Varun Saxena added a comment - +1. Will commit it shortly.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 6m 36s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 37s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 57s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 28s the patch passed
          +1 javac 0m 28s the patch passed
          +1 checkstyle 0m 20s the patch passed
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 3s the patch passed
          +1 javadoc 0m 18s the patch passed
          +1 unit 35m 49s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          50m 43s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue YARN-5773
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835951/YARN-5773.0009.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 92ceab75ba6e 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ebb8823
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13663/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13663/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 36s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 57s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 28s the patch passed +1 javac 0m 28s the patch passed +1 checkstyle 0m 20s the patch passed +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 3s the patch passed +1 javadoc 0m 18s the patch passed +1 unit 35m 49s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 50m 43s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue YARN-5773 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835951/YARN-5773.0009.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 92ceab75ba6e 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ebb8823 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13663/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13663/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Uploaded patch after fixing UT

          Show
          bibinchundatt Bibin A Chundatt added a comment - Uploaded patch after fixing UT
          Hide
          varun_saxena Varun Saxena added a comment -

          Bibin A Chundatt, TestClientRMService failure is related. It times out with the changes. Can you look at it ?

          Show
          varun_saxena Varun Saxena added a comment - Bibin A Chundatt , TestClientRMService failure is related. It times out with the changes. Can you look at it ?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 59s trunk passed
          +1 compile 0m 34s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 1m 0s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 30s the patch passed
          +1 javac 0m 30s the patch passed
          +1 checkstyle 0m 21s the patch passed
          +1 mvnsite 0m 35s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 1s the patch passed
          +1 javadoc 0m 18s the patch passed
          -1 unit 35m 37s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          51m 13s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
          Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue YARN-5773
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835889/YARN-5773.0008.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 2b8559c5fb1a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8a9388e
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13651/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13651/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13651/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 59s trunk passed +1 compile 0m 34s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 1m 0s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 30s the patch passed +1 javac 0m 30s the patch passed +1 checkstyle 0m 21s the patch passed +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 1s the patch passed +1 javadoc 0m 18s the patch passed -1 unit 35m 37s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 51m 13s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue YARN-5773 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835889/YARN-5773.0008.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2b8559c5fb1a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8a9388e Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/13651/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13651/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13651/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          varun_saxena Varun Saxena added a comment -

          Thanks Bibin A Chundatt for the patch.
          Changes look fine to me.

          Will commit it pending Jenkins. There are a couple of typos though in the log message below. Will take care while committing.

          836	        LOG.info("Skiping activateApplications for "
          837	            + application.getApplicationAttemptId()
          838	            + " since cluster resorce is " + Resources.none());
          
          Show
          varun_saxena Varun Saxena added a comment - Thanks Bibin A Chundatt for the patch. Changes look fine to me. Will commit it pending Jenkins. There are a couple of typos though in the log message below. Will take care while committing. 836 LOG.info( "Skiping activateApplications for " 837 + application.getApplicationAttemptId() 838 + " since cluster resorce is " + Resources.none());
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Attaching patch after handling UT fix.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Attaching patch after handling UT fix.
          Hide
          sunilg Sunil G added a comment -

          Yes. +1 for fixing UT cases here.

          Show
          sunilg Sunil G added a comment - Yes. +1 for fixing UT cases here.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Bibin A Chundatt updating the patch and all others for reviews.

          The approach in latest patch LGTM, the patch can be committed once UT failures fixed.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks Bibin A Chundatt updating the patch and all others for reviews. The approach in latest patch LGTM, the patch can be committed once UT failures fixed.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Sunil G/Varun Saxena
          Details of Failure
          TestApplicationPriority.testOrderOfActivatingThePriorityApplicationOnRMRestart

              // Before NM registration, AMResourceLimit threshold is 0. So 1st
              // applications get activated nevertheless of AMResourceLimit threshold
              // Two applications are in pending
              Assert.assertEquals(1, defaultQueue.getNumActiveApplications());
          

          As per testcase evern before registration of NM its expecting one application to be activated.

          TestCapacityScheduler.testAMUsedResource
          NM is not started and one AM is expected to be activated.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Sunil G / Varun Saxena Details of Failure TestApplicationPriority.testOrderOfActivatingThePriorityApplicationOnRMRestart // Before NM registration, AMResourceLimit threshold is 0. So 1st // applications get activated nevertheless of AMResourceLimit threshold // Two applications are in pending Assert.assertEquals(1, defaultQueue.getNumActiveApplications()); As per testcase evern before registration of NM its expecting one application to be activated. TestCapacityScheduler.testAMUsedResource NM is not started and one AM is expected to be activated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Varun Saxena

          hadoop.yarn.server.resourcemanager.TestRMRestart - YARN-5548
          All other test cases looks dependent on getNumActiveApplications . So can we go ahead with 0005.patch

          Show
          bibinchundatt Bibin A Chundatt added a comment - Varun Saxena hadoop.yarn.server.resourcemanager.TestRMRestart - YARN-5548 All other test cases looks dependent on getNumActiveApplications . So can we go ahead with 0005.patch
          Hide
          varun_saxena Varun Saxena added a comment -

          Bibin A Chundatt, few if not all the test failures are related. Kindly fix them

          Show
          varun_saxena Varun Saxena added a comment - Bibin A Chundatt , few if not all the test failures are related. Kindly fix them
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 8m 9s trunk passed
          +1 compile 0m 35s trunk passed
          +1 checkstyle 0m 21s trunk passed
          +1 mvnsite 0m 40s trunk passed
          +1 mvneclipse 0m 20s trunk passed
          +1 findbugs 1m 6s trunk passed
          +1 javadoc 0m 22s trunk passed
          +1 mvninstall 0m 34s the patch passed
          +1 compile 0m 31s the patch passed
          +1 javac 0m 31s the patch passed
          +1 checkstyle 0m 20s the patch passed
          +1 mvnsite 0m 44s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 12s the patch passed
          +1 javadoc 0m 19s the patch passed
          -1 unit 36m 20s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 31s The patch does not generate ASF License warnings.
          53m 58s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
            hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority
          Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue YARN-5773
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835620/YARN-5773.0006.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux cc52a21531b2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ac35ee9
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13558/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13558/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13558/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 9s trunk passed +1 compile 0m 35s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 40s trunk passed +1 mvneclipse 0m 20s trunk passed +1 findbugs 1m 6s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 34s the patch passed +1 compile 0m 31s the patch passed +1 javac 0m 31s the patch passed +1 checkstyle 0m 20s the patch passed +1 mvnsite 0m 44s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 12s the patch passed +1 javadoc 0m 19s the patch passed -1 unit 36m 20s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 31s The patch does not generate ASF License warnings. 53m 58s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler   hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue YARN-5773 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835620/YARN-5773.0006.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux cc52a21531b2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ac35ee9 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/13558/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13558/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13558/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Attached patch removing num of apps check

          Show
          bibinchundatt Bibin A Chundatt added a comment - Attached patch removing num of apps check
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          I too agree to not to have just 1 app to be in activated state

          Show
          Naganarasimha Naganarasimha G R added a comment - I too agree to not to have just 1 app to be in activated state
          Hide
          varun_saxena Varun Saxena added a comment - - edited

          I did verify the same when cluster resource is empty and submit application.The first application attempt is activated.

          Yes, that will happen right now because we are checking for am limits inside loop. But its not required in my opinion when we check for overall cluster resources. Its not that its a bug per say in your current patch, its just that condition in my opinion is unnecessary. It is just meant to be there for some other reason.
          As the log suggests it is kept there to cover cases where maximum am resource percent is kept too low for a queue. We do not want to block apps in this case.
          When overall cluster resources are 0, not even 1 application being activated is because cluster resources are 0, not because am resource percent is insufficient.

          Thoughts ?

                      LOG.warn("maximum-am-resource-percent is insufficient to start a"
                          + " single application in queue, it is likely set too low."
                          + " skipping enforcement to allow at least one application"
                          + " to start");
          
          Show
          varun_saxena Varun Saxena added a comment - - edited I did verify the same when cluster resource is empty and submit application.The first application attempt is activated. Yes, that will happen right now because we are checking for am limits inside loop. But its not required in my opinion when we check for overall cluster resources. Its not that its a bug per say in your current patch, its just that condition in my opinion is unnecessary. It is just meant to be there for some other reason. As the log suggests it is kept there to cover cases where maximum am resource percent is kept too low for a queue. We do not want to block apps in this case. When overall cluster resources are 0, not even 1 application being activated is because cluster resources are 0, not because am resource percent is insufficient. Thoughts ? LOG.warn( "maximum-am-resource-percent is insufficient to start a" + " single application in queue, it is likely set too low." + " skipping enforcement to allow at least one application" + " to start" );
          Hide
          bibinchundatt Bibin A Chundatt added a comment - - edited

          When cluster resource is 0, this is unlikely to be a case.

          I did verify the same when cluster resource is empty and submit application.The first application attempt is activated.
          I think its should be fine keeping the same. Since the loop will run only once.Hope in last patch no changes are required.

          Show
          bibinchundatt Bibin A Chundatt added a comment - - edited When cluster resource is 0, this is unlikely to be a case. I did verify the same when cluster resource is empty and submit application.The first application attempt is activated. I think its should be fine keeping the same. Since the loop will run only once.Hope in last patch no changes are required.
          Hide
          sunilg Sunil G added a comment -

          I also think getNumActiveApplications is not needed to be checked during Recovery call flow.

          Show
          sunilg Sunil G added a comment - I also think getNumActiveApplications is not needed to be checked during Recovery call flow.
          Hide
          varun_saxena Varun Saxena added a comment -

          Earlier implementation also followed the same. if i understand correctly

          Earlier implementation was to handle the case where AM resource limit has been set too low. When cluster resource is 0, this is unlikely to be a case.

          Show
          varun_saxena Varun Saxena added a comment - Earlier implementation also followed the same. if i understand correctly Earlier implementation was to handle the case where AM resource limit has been set too low. When cluster resource is 0, this is unlikely to be a case.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 6m 47s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 20s trunk passed
          +1 mvnsite 0m 39s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 0m 57s trunk passed
          +1 javadoc 0m 21s trunk passed
          +1 mvninstall 0m 39s the patch passed
          +1 compile 0m 35s the patch passed
          +1 javac 0m 35s the patch passed
          +1 checkstyle 0m 21s the patch passed
          +1 mvnsite 0m 42s the patch passed
          +1 mvneclipse 0m 16s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 20s the patch passed
          +1 javadoc 0m 21s the patch passed
          -1 unit 130m 38s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          146m 36s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
            hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestApplicationMasterService
            hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimitsByPartition
            hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
            hadoop.yarn.server.resourcemanager.scheduler.TestSchedulingWithAllocationRequestId
            hadoop.yarn.server.resourcemanager.TestApplicationCleanup
            hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerLazyPreemption
            hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher
            hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
            hadoop.yarn.server.resourcemanager.TestRM
            hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
            hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
            hadoop.yarn.server.resourcemanager.TestDecommissioningNodesWatcher
            hadoop.yarn.server.resourcemanager.TestResourceManager
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
            hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
            hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth
            hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivities
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits
            hadoop.yarn.server.resourcemanager.scheduler.policy.TestFairOrderingPolicy
            hadoop.yarn.server.resourcemanager.TestResourceTrackerService
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority
            hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl
          Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
            org.apache.hadoop.yarn.server.resourcemanager.TestSignalContainer
            org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
            org.apache.hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures
            org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId
            org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
            org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
            org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates
            org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
            org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf
            org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue YARN-5773
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835490/YARN-5773.0004.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ebb678ad35fe 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / e29cba6
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13535/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13535/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13535/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 47s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 39s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 57s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 39s the patch passed +1 compile 0m 35s the patch passed +1 javac 0m 35s the patch passed +1 checkstyle 0m 21s the patch passed +1 mvnsite 0m 42s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 20s the patch passed +1 javadoc 0m 21s the patch passed -1 unit 130m 38s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 146m 36s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation   hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer   hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestApplicationMasterService   hadoop.yarn.server.resourcemanager.security.TestAMRMTokens   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimitsByPartition   hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler   hadoop.yarn.server.resourcemanager.scheduler.TestSchedulingWithAllocationRequestId   hadoop.yarn.server.resourcemanager.TestApplicationCleanup   hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerLazyPreemption   hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher   hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA   hadoop.yarn.server.resourcemanager.TestRM   hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation   hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer   hadoop.yarn.server.resourcemanager.TestDecommissioningNodesWatcher   hadoop.yarn.server.resourcemanager.TestResourceManager   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue   hadoop.yarn.server.resourcemanager.TestContainerResourceUsage   hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivities   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits   hadoop.yarn.server.resourcemanager.scheduler.policy.TestFairOrderingPolicy   hadoop.yarn.server.resourcemanager.TestResourceTrackerService   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority   hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter   org.apache.hadoop.yarn.server.resourcemanager.TestSignalContainer   org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart   org.apache.hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures   org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId   org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore   org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService   org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates   org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart   org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf   org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue YARN-5773 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835490/YARN-5773.0004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ebb678ad35fe 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e29cba6 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/13535/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13535/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13535/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 8s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 21s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 18s trunk passed
          +1 findbugs 1m 0s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          +1 checkstyle 0m 18s the patch passed
          +1 mvnsite 0m 33s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 10s the patch passed
          +1 javadoc 0m 18s the patch passed
          +1 unit 37m 12s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          52m 45s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue YARN-5773
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835492/YARN-5773.0005.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 4ce558c0fd2b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0bdd263
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13536/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13536/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 8s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 1m 0s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 18s the patch passed +1 mvnsite 0m 33s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 10s the patch passed +1 javadoc 0m 18s the patch passed +1 unit 37m 12s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 52m 45s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue YARN-5773 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835492/YARN-5773.0005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4ce558c0fd2b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0bdd263 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13536/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13536/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Ignore Yarn-5773.0004 patch. Attaching latest patch

          Show
          bibinchundatt Bibin A Chundatt added a comment - Ignore Yarn-5773.0004 patch. Attaching latest patch
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Varun Saxena
          Earlier implementation also followed the same. if i understand correctly

          Show
          bibinchundatt Bibin A Chundatt added a comment - Varun Saxena Earlier implementation also followed the same. if i understand correctly
          Hide
          varun_saxena Varun Saxena added a comment -

          Bibin A Chundatt,
          Below check for getNumActiveApplications is not required.
          If cluster resources are 0, there is no point activating even one app.

          703	      if (!Resources.greaterThan(resourceCalculator, lastClusterResource,
          704	          lastClusterResource, Resources.none())
          705	          && !(getNumActiveApplications() < 1)) {
          706	        return;
          707	      }
          

          I think a log can be added under the condition, maybe at DEBUG log level to avoid too many logs.

          Cluster resource UI is self explanatory, so required be add ??

          I am +0 on this. Sunil G, your thoughts on this ?

          Show
          varun_saxena Varun Saxena added a comment - Bibin A Chundatt , Below check for getNumActiveApplications is not required. If cluster resources are 0, there is no point activating even one app. 703 if (!Resources.greaterThan(resourceCalculator, lastClusterResource, 704 lastClusterResource, Resources.none()) 705 && !(getNumActiveApplications() < 1)) { 706 return ; 707 } I think a log can be added under the condition, maybe at DEBUG log level to avoid too many logs. Cluster resource UI is self explanatory, so required be add ?? I am +0 on this. Sunil G , your thoughts on this ?
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Varun Saxena
          Thank you for review comment. Attached latest patch handling all comments.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Varun Saxena Thank you for review comment. Attached latest patch handling all comments.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          IIUC SchedulerApplicationAttempt#isRecovering is set only in following case .App is in ACCEPTED state i am not sure we will get isRecovery=true

                  // We will replay the final attempt only if last attempt is in final
                  // state but application is not in final state.
                  if (rmApp.getCurrentAppAttempt() == appAttempt
                      && !RMAppImpl.isAppInFinalState(rmApp)) {
                    // Add the previous finished attempt to scheduler synchronously so
                    // that scheduler knows the previous attempt.
                    appAttempt.scheduler.handle(new AppAttemptAddedSchedulerEvent(
                      appAttempt.getAppAttemptId(), false, true));
                    (new BaseFinalTransition(appAttempt.recoveredFinalState)).transition(
                        appAttempt, event);
                  }
          

          should we update AM diagnostics if we return right from beginning of activateApplications

          Cluster resource UI is self explanatory, so required be add ??

          Also, in the patch LOG.debug statements should be guarded with LOG.isDebugEnabled check

          Will update the same in next patch

          Show
          bibinchundatt Bibin A Chundatt added a comment - IIUC SchedulerApplicationAttempt#isRecovering is set only in following case .App is in ACCEPTED state i am not sure we will get isRecovery=true // We will replay the final attempt only if last attempt is in final // state but application is not in final state. if (rmApp.getCurrentAppAttempt() == appAttempt && !RMAppImpl.isAppInFinalState(rmApp)) { // Add the previous finished attempt to scheduler synchronously so // that scheduler knows the previous attempt. appAttempt.scheduler.handle( new AppAttemptAddedSchedulerEvent( appAttempt.getAppAttemptId(), false , true )); ( new BaseFinalTransition(appAttempt.recoveredFinalState)).transition( appAttempt, event); } should we update AM diagnostics if we return right from beginning of activateApplications Cluster resource UI is self explanatory, so required be add ?? Also, in the patch LOG.debug statements should be guarded with LOG.isDebugEnabled check Will update the same in next patch
          Hide
          varun_saxena Varun Saxena added a comment -

          Now the question here is to invoke activateApplications after scheduler recovery is done.

          Do we need to activateApplications ? Applications would be activated when node re-registers. Right ?

          Coming to code in the patch, checking against cluster resource being 0 should be fine but resources may not be 0 only on recovery so should we update AM diagnostics if we return right from beginning of activateApplications ? Also a log should be added.
          One more thing, the indication of whether attempt is recovering or not can be received from SchedulerApplicationAttempt#isRecovering which is filled from AppAttemptAddedSchedulerEvent. This can be used as well to differentiate between 0 resources because app is recovering or no nodes registered.

          Also, in the patch LOG.debug statements should be guarded with LOG.isDebugEnabled check

          Show
          varun_saxena Varun Saxena added a comment - Now the question here is to invoke activateApplications after scheduler recovery is done. Do we need to activateApplications ? Applications would be activated when node re-registers. Right ? Coming to code in the patch, checking against cluster resource being 0 should be fine but resources may not be 0 only on recovery so should we update AM diagnostics if we return right from beginning of activateApplications ? Also a log should be added. One more thing, the indication of whether attempt is recovering or not can be received from SchedulerApplicationAttempt#isRecovering which is filled from AppAttemptAddedSchedulerEvent. This can be used as well to differentiate between 0 resources because app is recovering or no nodes registered. Also, in the patch LOG.debug statements should be guarded with LOG.isDebugEnabled check
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 6m 46s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 21s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 0m 58s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 30s the patch passed
          +1 javac 0m 30s the patch passed
          +1 checkstyle 0m 18s the patch passed
          +1 mvnsite 0m 36s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 3s the patch passed
          +1 javadoc 0m 18s the patch passed
          +1 unit 35m 55s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          50m 23s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835100/YARN-5773.003.patch
          JIRA Issue YARN-5773
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 76552f764153 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 24a83fe
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13521/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13521/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 46s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 58s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 30s the patch passed +1 javac 0m 30s the patch passed +1 checkstyle 0m 18s the patch passed +1 mvnsite 0m 36s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 3s the patch passed +1 javadoc 0m 18s the patch passed +1 unit 35m 55s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 50m 23s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835100/YARN-5773.003.patch JIRA Issue YARN-5773 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 76552f764153 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 24a83fe Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13521/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13521/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment - - edited

          Sunil G have raise JIRA YARN-5781 for handling optimization cases

          Show
          bibinchundatt Bibin A Chundatt added a comment - - edited Sunil G have raise JIRA YARN-5781 for handling optimization cases
          Hide
          sunilg Sunil G added a comment -

          In scheduler, as of now there are no event/apis to know whether recovery is done or not.
          Basically apps could be submitted even when none of the nodes are registered. I understood that the easy fix here is to skip invoking activateApplications during recovery. Its already have no meaning in the recovery flow. So as I mentioned earlier and as in patch, we can have this.
          Now the question here is to invoke activateApplications after scheduler recovery is done. Advantage of such a call is that scheduler will no longer need to worry about the chain of sequential steps of RM recovery (starting active service as last in those steps or not).
          Since recovery is done event based, a direct api will not be correct. Rather, one more event to be published. I will check the feasibility for this and update here.

          Show
          sunilg Sunil G added a comment - In scheduler, as of now there are no event/apis to know whether recovery is done or not. Basically apps could be submitted even when none of the nodes are registered. I understood that the easy fix here is to skip invoking activateApplications during recovery. Its already have no meaning in the recovery flow. So as I mentioned earlier and as in patch, we can have this. Now the question here is to invoke activateApplications after scheduler recovery is done. Advantage of such a call is that scheduler will no longer need to worry about the chain of sequential steps of RM recovery (starting active service as last in those steps or not). Since recovery is done event based, a direct api will not be correct. Rather, one more event to be published. I will check the feasibility for this and update here.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          we could only invoke activateApplications once after recovering all apps

          Any issue with current handling based on cluster resource.activateApplication() get invoked during following cases

          1. Application finish
          2. Reinitialize queue (refresh,scheduler service start)
          3. Attempt add
          4. node add, update cluster resource etc.
            Handling recovery using cluster resource i felt can cover all cases along with log change for which is too costly . Else for each case we have to handle separately.
            Resource based handle we do have in many cases assignContainers based on pendingResources too rt??

          Recover time i am not sure from scheduler side we can get whether recovery is completed for all apps since its even based.As Rohith mentioned earlier the apps recovery from store and scheduler side apps recovery are different .

          Show
          bibinchundatt Bibin A Chundatt added a comment - we could only invoke activateApplications once after recovering all apps Any issue with current handling based on cluster resource. activateApplication() get invoked during following cases Application finish Reinitialize queue (refresh,scheduler service start) Attempt add node add, update cluster resource etc. Handling recovery using cluster resource i felt can cover all cases along with log change for which is too costly . Else for each case we have to handle separately. Resource based handle we do have in many cases assignContainers based on pendingResources too rt?? Recover time i am not sure from scheduler side we can get whether recovery is completed for all apps since its even based.As Rohith mentioned earlier the apps recovery from store and scheduler side apps recovery are different .
          Hide
          sunilg Sunil G added a comment -

          Currently we are trying to invoke activateApplications while recovering each application. Yes, as of now nodes are getting registered later in the flow. But for scheduler, we need not have to consider such timing cases from RMAppManager/RM end. Being said that, its important to separate 2 issues out here
          Recovery call flow for each app in Scheduler should not invoke activateApplications every time
          activateApplications itself could be improved by considering AM head room. But that could be done in another ticket, as this one is focusing on fixing recovery call flow.
          To address issue 1, we could only invoke activateApplications once after recovering all apps. By this, we can remove the timing dependency from RM end for recovery. With this change, even if there is a change in RM recovery model, scheduler would have done its complete recovery flow w/o causing any performance issue or waiting for resourceTrackerService to register nodes. Thanks Wangda Tan for the thoughts.
          Thoughts?

          Show
          sunilg Sunil G added a comment - Currently we are trying to invoke activateApplications while recovering each application. Yes, as of now nodes are getting registered later in the flow. But for scheduler, we need not have to consider such timing cases from RMAppManager/RM end. Being said that, its important to separate 2 issues out here Recovery call flow for each app in Scheduler should not invoke activateApplications every time activateApplications itself could be improved by considering AM head room. But that could be done in another ticket, as this one is focusing on fixing recovery call flow. To address issue 1, we could only invoke activateApplications once after recovering all apps. By this, we can remove the timing dependency from RM end for recovery. With this change, even if there is a change in RM recovery model, scheduler would have done its complete recovery flow w/o causing any performance issue or waiting for resourceTrackerService to register nodes. Thanks Wangda Tan for the thoughts. Thoughts?
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Attaching patch to handle only recovery

          Show
          bibinchundatt Bibin A Chundatt added a comment - Attaching patch to handle only recovery
          Hide
          varun_saxena Varun Saxena added a comment -

          Optimizing activateApplication() can be handled in new JIRA. Thoughts ?

          Agree. I think it should be handled separately.

          Show
          varun_saxena Varun Saxena added a comment - Optimizing activateApplication() can be handled in new JIRA. Thoughts ? Agree. I think it should be handled separately.
          Hide
          varun_saxena Varun Saxena added a comment - - edited

          Is there any need to activate applications on recovery ? Cluster resources will anyways be 0 on recovery as resource tracker service has not yet started. Maybe pass it in the event so that scheduler knows that recovery is happening while adding attempt.
          We can however check for cluster resources or user limit right in the beginning while activating applications and come out of it if applicable resources are 0. That will have same impact on recovery.

          Overall i.e. in normal flow, to optimize activateApplications, Wangda's suggestion sounds good. But ordering policy will have to be maintained as well. Right ?

          Show
          varun_saxena Varun Saxena added a comment - - edited Is there any need to activate applications on recovery ? Cluster resources will anyways be 0 on recovery as resource tracker service has not yet started. Maybe pass it in the event so that scheduler knows that recovery is happening while adding attempt. We can however check for cluster resources or user limit right in the beginning while activating applications and come out of it if applicable resources are 0. That will have same impact on recovery. Overall i.e. in normal flow, to optimize activateApplications, Wangda's suggestion sounds good. But ordering policy will have to be maintained as well. Right ?
          Hide
          bibinchundatt Bibin A Chundatt added a comment - - edited

          Thank you Sunil G,Wangda,Varun

          Recovery scenario will handle based on comment along with logLevel change .
          Optimizing activateApplication() can be handled in new JIRA.
          Thoughts?

          Show
          bibinchundatt Bibin A Chundatt added a comment - - edited Thank you Sunil G ,Wangda,Varun Recovery scenario will handle based on comment along with logLevel change . Optimizing activateApplication() can be handled in new JIRA. Thoughts?
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          3. As mentioned by Bibin A Chundatt, when each app fails to get activated due to the upper cut of resource limit, one INFO log is emitted (because amLimit is 0). During recovery, this is costly.

          Thanks Sunil G for mentioning about logging missed out to mention in my earlier comment.
          Costly logging during recovery always the amLimit will be zero

            LOG.info("Not activating application " + applicationId
                          + " as  amIfStarted: " + amIfStarted + " exceeds amLimit: "
                          + amLimit);
          
          Show
          bibinchundatt Bibin A Chundatt added a comment - 3. As mentioned by Bibin A Chundatt, when each app fails to get activated due to the upper cut of resource limit, one INFO log is emitted (because amLimit is 0). During recovery, this is costly. Thanks Sunil G for mentioning about logging missed out to mention in my earlier comment. Costly logging during recovery always the amLimit will be zero LOG.info("Not activating application " + applicationId + " as amIfStarted: " + amIfStarted + " exceeds amLimit: " + amLimit);
          Hide
          sunilg Sunil G added a comment - - edited

          Issues in Recovery of apps:
          1. activateApplications works under a write lock.
          2. If one application is found of overflowing AM resource limit, instead of breaking from loop, we continue and play complete apps from pendingOrderingPolicy. We may need to iterate all apps because we have apps belongs to different partition and pendingOrderingPolicy does not provide any order for apps based on partition.
          3. As mentioned by Bibin A Chundatt, when each app fails to get activated due to the upper cut of resource limit, one INFO log is emitted (because amLimit is 0). During recovery, this is costly.

          Wangda Tan and Rohith Sharma K S

          If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom?

          But one point to be considered is for each Node registration, head room changes. So, user head room changes as new node registered. This need to be taken care.

          Currently activateApplications is invoked when there is a change in cluster resource. So any change in cluster resource will ensure a call to activateApplications and we can recalculate this headroom. I am not very sure about the suggested map. Will this check be coming before we do the existing AM resource percentage check for queue/partition (not user based) ? OR are we replacing this checks?

          Show
          sunilg Sunil G added a comment - - edited Issues in Recovery of apps: 1. activateApplications works under a write lock. 2. If one application is found of overflowing AM resource limit, instead of breaking from loop, we continue and play complete apps from pendingOrderingPolicy. We may need to iterate all apps because we have apps belongs to different partition and pendingOrderingPolicy does not provide any order for apps based on partition. 3. As mentioned by Bibin A Chundatt , when each app fails to get activated due to the upper cut of resource limit, one INFO log is emitted (because amLimit is 0). During recovery, this is costly. Wangda Tan and Rohith Sharma K S If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom? But one point to be considered is for each Node registration, head room changes. So, user head room changes as new node registered. This need to be taken care. Currently activateApplications is invoked when there is a change in cluster resource. So any change in cluster resource will ensure a call to activateApplications and we can recalculate this headroom. I am not very sure about the suggested map. Will this check be coming before we do the existing AM resource percentage check for queue/partition (not user based) ? OR are we replacing this checks?
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Thanks folks for discussion..
          I went through overall above discussion, I have one doubt that How can RM recovery is too slow? Because in current RM Restart, there are 2 stages.

          1. Recover : Read all the application data from ZooKeeper and replay it. Basically, for running/pending apps, an event will be triggered to scheduler, and scheduler has separate dispatcher to handle it.
          2. Service Start : Once recover process is completed, all the RM services are started.
            IICU, RM service is up and able to accept a new requests from clients. So, problem is after RM service start, activating applications are being delayed because Nodes are not yet registered but not actual recovery. It would be better if JIRA summary is updated something like, "Scheduler takes longer time for activating recovered apps when RM is restarted" or any other.

          As far as improvement, as wangda suggested may be we can keep Map<UserName, List<Application>> which would optimize in activateApplication for head room. But one point to be considered is for each Node registration, head room changes. So, user head room changes as new node registered. This need to be taken care.

          Show
          rohithsharma Rohith Sharma K S added a comment - Thanks folks for discussion.. I went through overall above discussion, I have one doubt that How can RM recovery is too slow? Because in current RM Restart, there are 2 stages. Recover : Read all the application data from ZooKeeper and replay it. Basically, for running/pending apps, an event will be triggered to scheduler, and scheduler has separate dispatcher to handle it. Service Start : Once recover process is completed, all the RM services are started. IICU, RM service is up and able to accept a new requests from clients. So, problem is after RM service start, activating applications are being delayed because Nodes are not yet registered but not actual recovery. It would be better if JIRA summary is updated something like, "Scheduler takes longer time for activating recovered apps when RM is restarted" or any other. As far as improvement, as wangda suggested may be we can keep Map<UserName, List<Application>> which would optimize in activateApplication for head room. But one point to be considered is for each Node registration, head room changes. So, user head room changes as new node registered. This need to be taken care.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom?

          Skip all apps only queueUsage.getAMUsed > amLimit. Since AM can be from different partition and each partition can be have different AM limit so AM limit for all partition also have to exceed

          Checking both the cases before iterating through all the apps.

                if (!Resources.greaterThan(resourceCalculator, lastClusterResource,
                    lastClusterResource, Resources.none())
                    && !(getNumActiveApplications() < 1)) {
                  return;
                }
          
                Map<String, Resource> userAmPartitionLimit =
                    new HashMap<String, Resource>();
          
                // AM Resource Limit for accessible labels can be pre-calculated.
                // This will help in updating AMResourceLimit for all labels when queue
                // is initialized for the first time (when no applications are present).
                for (String nodePartition : getNodeLabelsForQueue()) {
                  calculateAndGetAMResourceLimitPerPartition(nodePartition);
                }
          
             if(allpatitionLimitexeed()&&!(getNumActiveApplications() < 1)){
                 return;
             }
          
          Show
          bibinchundatt Bibin A Chundatt added a comment - If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom? Skip all apps only queueUsage.getAMUsed > amLimit . Since AM can be from different partition and each partition can be have different AM limit so AM limit for all partition also have to exceed Checking both the cases before iterating through all the apps. if (!Resources.greaterThan(resourceCalculator, lastClusterResource, lastClusterResource, Resources.none()) && !(getNumActiveApplications() < 1)) { return; } Map<String, Resource> userAmPartitionLimit = new HashMap<String, Resource>(); // AM Resource Limit for accessible labels can be pre-calculated. // This will help in updating AMResourceLimit for all labels when queue // is initialized for the first time (when no applications are present). for (String nodePartition : getNodeLabelsForQueue()) { calculateAndGetAMResourceLimitPerPartition(nodePartition); } if(allpatitionLimitexeed()&&!(getNumActiveApplications() < 1)){ return; }
          Hide
          varun_saxena Varun Saxena added a comment -

          Sunil G, we still need to add the apps to pendingOrderingPolicy. Its just that there is no need of running over all the pending apps on recovery of each unfinished app as NMs' have not yet registered (they wont till recovery finishes). Iterating over all the apps on recovery of each unfinished app I feel is unnecessary as it will time and again hit the same condition and will be unable to activate application.

          Show
          varun_saxena Varun Saxena added a comment - Sunil G , we still need to add the apps to pendingOrderingPolicy. Its just that there is no need of running over all the pending apps on recovery of each unfinished app as NMs' have not yet registered (they wont till recovery finishes). Iterating over all the apps on recovery of each unfinished app I feel is unnecessary as it will time and again hit the same condition and will be unable to activate application.
          Hide
          leftnoteasy Wangda Tan added a comment -

          I feel we may need a overhaul to existing activateApplication:

          If we describe what activateApplications target to solve:
          A set of pending applications in a queue, each application belongs to one user, different application has different AM request, each user has a quota, and queue has a total quota, get which application will be activated.

          There's an additional questions:
          If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom?

          If answer to the above question yes, we can maintain a map: Map<UserName, List<Application>>, when doing application activation, we don't need to check all the apps, instead we only need to check each user once in most cases.

          Show
          leftnoteasy Wangda Tan added a comment - I feel we may need a overhaul to existing activateApplication: If we describe what activateApplications target to solve: A set of pending applications in a queue, each application belongs to one user, different application has different AM request, each user has a quota, and queue has a total quota, get which application will be activated. There's an additional questions: If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom? If answer to the above question yes, we can maintain a map: Map<UserName, List<Application>>, when doing application activation, we don't need to check all the apps, instead we only need to check each user once in most cases.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Sunil G
          Till then, only one app will activated and rest all apps will be in pending state.

          • So for N-1 application the AM check happens about (N-1)(N-2)/2 rt? Which we are sure that will not be satisfied since registration happens later. Correct me if i am wrong. So all those apps its not required to check for AM limit rt?
          Show
          bibinchundatt Bibin A Chundatt added a comment - Sunil G Till then, only one app will activated and rest all apps will be in pending state. So for N-1 application the AM check happens about (N-1)(N-2)/2 rt? Which we are sure that will not be satisfied since registration happens later. Correct me if i am wrong. So all those apps its not required to check for AM limit rt?
          Hide
          sunilg Sunil G added a comment -

          1.If cluster resource is zero don't check AM limit. 2. Skip all apps if queue's AM limit is reached.

          I am not so sure about this. recover happens first for all apps and Recover event will be fired for all apps. serviceStart happens later, so NMs will be connected to RM later. Till then, only one app will activated and rest all apps will be in pending state. As NMs are up/registered, remaining apps will become activated from pendingOrderingPolicy.

          Show
          sunilg Sunil G added a comment - 1.If cluster resource is zero don't check AM limit. 2. Skip all apps if queue's AM limit is reached. I am not so sure about this. recover happens first for all apps and Recover event will be fired for all apps. serviceStart happens later, so NMs will be connected to RM later. Till then, only one app will activated and rest all apps will be in pending state. As NMs are up/registered, remaining apps will become activated from pendingOrderingPolicy .
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Thank you Wangda Tan for review comment.

          I'm not sure if this is safe: activeApplication is majorly to avoid too many applications are running inside one queue. if we skip the AM limit check for recovering apps, it looks like some problem may occur. apps,

          Yes.we should not skip activate application.

          RM restart issue with too many pending apps was the main intention of this jira. If too many pending apps in leaf queue and RM is restarted for each app attempt submit event the Leaf#activateApplication() gets invoked and for each pending apps the am limit is checked. Restart time increases as the number of apps increases consuming too much time on restart.

          Will handle following two

          1. If cluster resource is zero don't check AM limit.
          2. Skip all apps if queue's AM limit is reached.
            Will upload a patch soon
          Show
          bibinchundatt Bibin A Chundatt added a comment - Thank you Wangda Tan for review comment. I'm not sure if this is safe: activeApplication is majorly to avoid too many applications are running inside one queue. if we skip the AM limit check for recovering apps, it looks like some problem may occur. apps, Yes.we should not skip activate application. RM restart issue with too many pending apps was the main intention of this jira. If too many pending apps in leaf queue and RM is restarted for each app attempt submit event the Leaf#activateApplication() gets invoked and for each pending apps the am limit is checked. Restart time increases as the number of apps increases consuming too much time on restart. Will handle following two If cluster resource is zero don't check AM limit. Skip all apps if queue's AM limit is reached. Will upload a patch soon
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 10s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 37s trunk passed
          +1 mvneclipse 0m 18s trunk passed
          +1 findbugs 0m 57s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 28s the patch passed
          +1 javac 0m 28s the patch passed
          -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 209 unchanged - 0 fixed = 210 total (was 209)
          +1 mvnsite 0m 36s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 2s the patch passed
          -1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 3 new + 938 unchanged - 0 fixed = 941 total (was 938)
          -1 unit 34m 51s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          49m 40s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
            hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834978/YARN-5773.0002.patch
          JIRA Issue YARN-5773
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e6af7d98acab 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / b18f35f
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13490/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13490/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 10s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 0m 57s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 28s the patch passed +1 javac 0m 28s the patch passed -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 209 unchanged - 0 fixed = 210 total (was 209) +1 mvnsite 0m 36s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 2s the patch passed -1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 3 new + 938 unchanged - 0 fixed = 941 total (was 938) -1 unit 34m 51s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 49m 40s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler   hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834978/YARN-5773.0002.patch JIRA Issue YARN-5773 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e6af7d98acab 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b18f35f Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13490/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13490/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Bibin A Chundatt for reporting and working on this issue.

          I'm not sure if this is safe: activeApplication is majorly to avoid too many applications are running inside one queue. if we skip the AM limit check for recovering apps, it looks like some problem may occur. For example, a cluster with 4K nodes and then restart only left 2K nodes, should we activate only some of the original submitted apps?

          In my mind we need to optimize activeApplications method, now it scan through all pending apps inside the queue under all conditions. We should be able to optimize this, for example, skip all apps if queue's AM limit reached.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks Bibin A Chundatt for reporting and working on this issue. I'm not sure if this is safe: activeApplication is majorly to avoid too many applications are running inside one queue. if we skip the AM limit check for recovering apps, it looks like some problem may occur. For example, a cluster with 4K nodes and then restart only left 2K nodes, should we activate only some of the original submitted apps? In my mind we need to optimize activeApplications method, now it scan through all pending apps inside the queue under all conditions. We should be able to optimize this, for example, skip all apps if queue's AM limit reached.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 21s trunk passed
          +1 compile 0m 35s trunk passed
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 40s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 1m 1s trunk passed
          +1 javadoc 0m 21s trunk passed
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          -1 checkstyle 0m 21s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 210 unchanged - 0 fixed = 211 total (was 210)
          +1 mvnsite 0m 35s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 1s the patch passed
          -1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 3 new + 938 unchanged - 0 fixed = 941 total (was 938)
          +1 unit 35m 22s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          50m 39s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834967/YARN-5773.0001.patch
          JIRA Issue YARN-5773
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 03b6c1d73bc8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / b18f35f
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13488/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13488/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13488/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13488/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 21s trunk passed +1 compile 0m 35s trunk passed +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 40s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 1m 1s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed -1 checkstyle 0m 21s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 210 unchanged - 0 fixed = 211 total (was 210) +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 1s the patch passed -1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 3 new + 938 unchanged - 0 fixed = 941 total (was 938) +1 unit 35m 22s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 50m 39s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834967/YARN-5773.0001.patch JIRA Issue YARN-5773 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 03b6c1d73bc8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b18f35f Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13488/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13488/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13488/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13488/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Attaching patch for the same. Capacity scheduler on recovery provides whether attempts is of type recovery or not. Skipping LeafQueue#activateApplication() when the attempt is of type recovery.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Attaching patch for the same. Capacity scheduler on recovery provides whether attempts is of type recovery or not. Skipping LeafQueue#activateApplication() when the attempt is of type recovery.
          Hide
          varun_saxena Varun Saxena added a comment - - edited

          Thanks Bibin A Chundatt for filing the JIRA.
          Agree that we do not need to iterate over all the pending apps on recovery as NMs' are not yet registered.
          If there are large number of running apps, RM unnecessarily spends quite a bit of time in this loop.

          Applications can be activated as and when NMs' register again.

          Show
          varun_saxena Varun Saxena added a comment - - edited Thanks Bibin A Chundatt for filing the JIRA. Agree that we do not need to iterate over all the pending apps on recovery as NMs' are not yet registered. If there are large number of running apps, RM unnecessarily spends quite a bit of time in this loop. Applications can be activated as and when NMs' register again.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Solution
          The following code to skip activateApplication() on recovery solved the problem.

          private synchronized void activateApplications() {
          if (!Resources.greaterThan(resourceCalculator, lastClusterResource,
          lastClusterResource, Resources.none())) {
          return;
          }
          ...
          

          Thoughts ???

          Show
          bibinchundatt Bibin A Chundatt added a comment - Solution The following code to skip activateApplication() on recovery solved the problem. private synchronized void activateApplications() { if (!Resources.greaterThan(resourceCalculator, lastClusterResource, lastClusterResource, Resources.none())) { return; } ... Thoughts ???

            People

            • Assignee:
              bibinchundatt Bibin A Chundatt
              Reporter:
              bibinchundatt Bibin A Chundatt
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development