Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0-beta
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently,AM max attempts is only checked if the current attempt fails and check to see whether to create new attempt. If the RM restarts before the max-attempt fails, it'll not clean the state store, when RM comes back, it will retry attempt again.

      1. YARN-534.5.patch
        3 kB
        Jian He
      2. YARN-534.4.patch
        12 kB
        Jian He
      3. YARN-534.3.patch
        12 kB
        Jian He
      4. YARN-534.2.patch
        7 kB
        Jian He
      5. YARN-534.1.patch
        7 kB
        Jian He

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1406 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1406/)
          YARN-594. Update test and add comments in YARN-534 (Jian He via bikas) (Revision 1470243)

          Result = SUCCESS
          bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470243
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1406 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1406/ ) YARN-594 . Update test and add comments in YARN-534 (Jian He via bikas) (Revision 1470243) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470243 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1379 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1379/)
          YARN-594. Update test and add comments in YARN-534 (Jian He via bikas) (Revision 1470243)

          Result = FAILURE
          bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470243
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1379 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1379/ ) YARN-594 . Update test and add comments in YARN-534 (Jian He via bikas) (Revision 1470243) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470243 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #190 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/190/)
          YARN-594. Update test and add comments in YARN-534 (Jian He via bikas) (Revision 1470243)

          Result = SUCCESS
          bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470243
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #190 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/190/ ) YARN-594 . Update test and add comments in YARN-534 (Jian He via bikas) (Revision 1470243) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470243 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3645 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3645/)
          YARN-594. Update test and add comments in YARN-534 (Jian He via bikas) (Revision 1470243)

          Result = SUCCESS
          bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470243
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3645 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3645/ ) YARN-594 . Update test and add comments in YARN-534 (Jian He via bikas) (Revision 1470243) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470243 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Jian/Bikas, can we open a new ticket for this instead of continuing on a resolved issue? Tx.

          Show
          Vinod Kumar Vavilapalli added a comment - Jian/Bikas, can we open a new ticket for this instead of continuing on a resolved issue? Tx.
          Hide
          Jian He added a comment -

          new patch add comments in case of work-preserve restart and put two more assertions in the test case.

          Show
          Jian He added a comment - new patch add comments in case of work-preserve restart and put two more assertions in the test case.
          Hide
          Jian He added a comment -

          Address last comments.
          DEFAULT_RM_AM_MAX_ATTEMPTS is already overridden as 5 in the very beginning

          Show
          Jian He added a comment - Address last comments. DEFAULT_RM_AM_MAX_ATTEMPTS is already overridden as 5 in the very beginning
          Hide
          Bikas Saha added a comment -

          There needs to be a comment here that this logic needs to change with work preserving restart since then if attemptCount==maxAttempts then the job still needs to be recovered because the last attempt may still be running. I had made this comment earlier above.

          +      if(appState.getAttemptCount() >= maxAppAttempts) {
          +        LOG.info("Not recovering application " + appState.getAppId() +
          +            " due to recovering attempt is beyond maxAppAttempt limit");
          +        shouldRecover = false;
          +      }
          

          Since the current value of DEFAULT_RM_AM_MAX_ATTEMPTS==1 how do we know that the unmanaged AM is not being recovered because it is unmanaged or because its max attempt limit is reached?

          +    RMApp appUnmanaged = rm1.submitApp(200, "someApp", "someUser", null, true,
          +        null, conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
          +          YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS));
          

          I think this also needs to check that app2 exists in store and app1 does not exist in store. We dont want the RM to by mistake continue to store apps like app1 which it is not going to recover. The specific bug in this case would be to not call recover() but also forget to call store.removeApplication().

          +    // verify that app2 exists  app1 is removed
          +    Assert.assertEquals(1, rm2.getRMContext().getRMApps().size());
          +    Assert.assertNotNull(rm2.getRMContext().getRMApps()
          +        .get(app2.getApplicationId()));
          +    Assert.assertNull(rm2.getRMContext().getRMApps()
          +        .get(app1.getApplicationId()));
          +
          +    // stop the RM
          
          Show
          Bikas Saha added a comment - There needs to be a comment here that this logic needs to change with work preserving restart since then if attemptCount==maxAttempts then the job still needs to be recovered because the last attempt may still be running. I had made this comment earlier above. + if (appState.getAttemptCount() >= maxAppAttempts) { + LOG.info( "Not recovering application " + appState.getAppId() + + " due to recovering attempt is beyond maxAppAttempt limit" ); + shouldRecover = false ; + } Since the current value of DEFAULT_RM_AM_MAX_ATTEMPTS==1 how do we know that the unmanaged AM is not being recovered because it is unmanaged or because its max attempt limit is reached? + RMApp appUnmanaged = rm1.submitApp(200, "someApp" , "someUser" , null , true , + null , conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, + YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS)); I think this also needs to check that app2 exists in store and app1 does not exist in store. We dont want the RM to by mistake continue to store apps like app1 which it is not going to recover. The specific bug in this case would be to not call recover() but also forget to call store.removeApplication(). + // verify that app2 exists app1 is removed + Assert.assertEquals(1, rm2.getRMContext().getRMApps().size()); + Assert.assertNotNull(rm2.getRMContext().getRMApps() + .get(app2.getApplicationId())); + Assert.assertNull(rm2.getRMContext().getRMApps() + .get(app1.getApplicationId())); + + // stop the RM
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1395 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1395/)
          YARN-534. Change RM restart recovery to also account for AM max-attempts configuration after the restart. Contributed by Jian He. (Revision 1466208)

          Result = SUCCESS
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466208
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1395 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1395/ ) YARN-534 . Change RM restart recovery to also account for AM max-attempts configuration after the restart. Contributed by Jian He. (Revision 1466208) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466208 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1368 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1368/)
          YARN-534. Change RM restart recovery to also account for AM max-attempts configuration after the restart. Contributed by Jian He. (Revision 1466208)

          Result = FAILURE
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466208
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1368 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1368/ ) YARN-534 . Change RM restart recovery to also account for AM max-attempts configuration after the restart. Contributed by Jian He. (Revision 1466208) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466208 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #179 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/179/)
          YARN-534. Change RM restart recovery to also account for AM max-attempts configuration after the restart. Contributed by Jian He. (Revision 1466208)

          Result = SUCCESS
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466208
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #179 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/179/ ) YARN-534 . Change RM restart recovery to also account for AM max-attempts configuration after the restart. Contributed by Jian He. (Revision 1466208) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466208 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3585 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3585/)
          YARN-534. Change RM restart recovery to also account for AM max-attempts configuration after the restart. Contributed by Jian He. (Revision 1466208)

          Result = SUCCESS
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466208
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3585 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3585/ ) YARN-534 . Change RM restart recovery to also account for AM max-attempts configuration after the restart. Contributed by Jian He. (Revision 1466208) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466208 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I just committed this to trunk and branch-2. Thanks Jian!

          Show
          Vinod Kumar Vavilapalli added a comment - I just committed this to trunk and branch-2. Thanks Jian!
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The latest patch looks good, +1.

          Eclipse problem is due to unclean .m2 on build machine, and it passes on my laptop.

          Checking this in.

          Show
          Vinod Kumar Vavilapalli added a comment - The latest patch looks good, +1. Eclipse problem is due to unclean .m2 on build machine, and it passes on my laptop. Checking this in.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577682/YARN-534.4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 eclipse:eclipse. The patch failed to build with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/692//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/692//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577682/YARN-534.4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. -1 eclipse:eclipse . The patch failed to build with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/692//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/692//console This message is automatically generated.
          Hide
          Jian He added a comment -

          fixed the test failure

          Show
          Jian He added a comment - fixed the test failure
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577674/YARN-534.3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/690//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/690//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577674/YARN-534.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/690//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/690//console This message is automatically generated.
          Hide
          Jian He added a comment -

          uploaded a new patch based on last comment

          Show
          Jian He added a comment - uploaded a new patch based on last comment
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The above two comments are for the test-case.

          Show
          Vinod Kumar Vavilapalli added a comment - The above two comments are for the test-case.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Also,

          • the setting of maxAppAttempts should happen before submit.
          • RM1 should be stopped before RM2 starts?
          Show
          Vinod Kumar Vavilapalli added a comment - Also, the setting of maxAppAttempts should happen before submit. RM1 should be stopped before RM2 starts?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Patch looks mostly good, couple of comments:

          • shouldRecovery should be shouldRecover
          • We aren't validating the code path to do with apps with -ve or illegal max-app-attempts, can we add one?
          Show
          Vinod Kumar Vavilapalli added a comment - Patch looks mostly good, couple of comments: shouldRecovery should be shouldRecover We aren't validating the code path to do with apps with -ve or illegal max-app-attempts, can we add one?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577639/YARN-534.2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/684//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/684//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577639/YARN-534.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/684//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/684//console This message is automatically generated.
          Hide
          Jian He added a comment -

          update the patch based on last comment

          Show
          Jian He added a comment - update the patch based on last comment
          Hide
          Bikas Saha added a comment -

          Looks good overall.

          Can the message be improved a bit? e.g. because maxAppAttempts have already been started. Secondly, there needs to be a comment here mentioning YARN-556 because this code needs to change when work preserving restart happens, right?

          +          LOG.info("Not recovering application " + appState.getAppId() +
          +              " due to hit maxAppAttempts limit");
          

          I dont think this is needed anymore because the capacity scheduler bug is fixed. Earlier capacity scheduler had a bug that made applications remain in pending state.

          +    conf.set(YarnConfiguration.RM_SCHEDULER, 
          +    "org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler");
          

          In the test can we also start another mockrm that uses the save app submission context to determine that max attempts that been reached. The global limit will have to be increased at that time.

          Can the test stop all started RM's?

          Minor suggestion in the code structure in RMAppManager.recover(). IMO, we can use a boolean flag shouldRecover which will be set to false if app is unmanaged or max retries is reached. That way we dont have to duplicate the remove logic.

          Show
          Bikas Saha added a comment - Looks good overall. Can the message be improved a bit? e.g. because maxAppAttempts have already been started. Secondly, there needs to be a comment here mentioning YARN-556 because this code needs to change when work preserving restart happens, right? + LOG.info( "Not recovering application " + appState.getAppId() + + " due to hit maxAppAttempts limit" ); I dont think this is needed anymore because the capacity scheduler bug is fixed. Earlier capacity scheduler had a bug that made applications remain in pending state. + conf.set(YarnConfiguration.RM_SCHEDULER, + "org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler" ); In the test can we also start another mockrm that uses the save app submission context to determine that max attempts that been reached. The global limit will have to be increased at that time. Can the test stop all started RM's? Minor suggestion in the code structure in RMAppManager.recover(). IMO, we can use a boolean flag shouldRecover which will be set to false if app is unmanaged or max retries is reached. That way we dont have to duplicate the remove logic.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577611/YARN-534.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 eclipse:eclipse. The patch failed to build with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/683//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/683//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577611/YARN-534.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. -1 eclipse:eclipse . The patch failed to build with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/683//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/683//console This message is automatically generated.
          Hide
          Jian He added a comment -

          Upload a patch, check maxAppAttempts on restart and add a test

          Show
          Jian He added a comment - Upload a patch, check maxAppAttempts on restart and add a test
          Hide
          Bikas Saha added a comment -

          Turns out that the max attempts limit is checked when job fails (and tries to launch new attempt) and not when the new attempt is actually being launched.
          The RM on restart, could choose to remove applications that have already hit the limit.

          Show
          Bikas Saha added a comment - Turns out that the max attempts limit is checked when job fails (and tries to launch new attempt) and not when the new attempt is actually being launched. The RM on restart, could choose to remove applications that have already hit the limit.

            People

            • Assignee:
              Jian He
              Reporter:
              Jian He
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development