Hadoop YARN
  1. Hadoop YARN
  2. YARN-542

Change the default global AM max-attempts value to be not one

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0-beta
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires.

      I propose we change it to atleast two. Can change it to 4 to match other retry-configs.

        Issue Links

          Activity

          Hide
          Zhijie Shen added a comment -

          @Vinod , how about conservatively changing max attempts to 2, if there's no strong motivation to give AM multiple retry chances?

          Show
          Zhijie Shen added a comment - @Vinod , how about conservatively changing max attempts to 2, if there's no strong motivation to give AM multiple retry chances?
          Hide
          Zhijie Shen added a comment -

          I've drafted a patch, which includes the following modifications:

          1. Change the default value of yarn.resourcemanager.am.max-attempts from 1 to 2.

          2. In the test cases, where more than one attempt is set, YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS is used instead of the hard-coding values.

          3. Assert the set maxAttempts > 1 where one and more than one will make difference.

          Show
          Zhijie Shen added a comment - I've drafted a patch, which includes the following modifications: 1. Change the default value of yarn.resourcemanager.am.max-attempts from 1 to 2. 2. In the test cases, where more than one attempt is set, YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS is used instead of the hard-coding values. 3. Assert the set maxAttempts > 1 where one and more than one will make difference.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12578289/YARN-542.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/718//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/718//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578289/YARN-542.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/718//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/718//console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          +1, this looks good. Checking it in.

          Show
          Vinod Kumar Vavilapalli added a comment - +1, this looks good. Checking it in.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Committed this to trunk and branch-2. Thanks Zhijie!

          Show
          Vinod Kumar Vavilapalli added a comment - Committed this to trunk and branch-2. Thanks Zhijie!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3642 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3642/)
          YARN-542. Changed the default global AM max-attempts value to be not one. Contributed by Zhijie Shen. (Revision 1470094)

          Result = SUCCESS
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470094
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3642 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3642/ ) YARN-542 . Changed the default global AM max-attempts value to be not one. Contributed by Zhijie Shen. (Revision 1470094) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470094 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #189 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/189/)
          YARN-542. Changed the default global AM max-attempts value to be not one. Contributed by Zhijie Shen. (Revision 1470094)

          Result = SUCCESS
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470094
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #189 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/189/ ) YARN-542 . Changed the default global AM max-attempts value to be not one. Contributed by Zhijie Shen. (Revision 1470094) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470094 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1378 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1378/)
          YARN-542. Changed the default global AM max-attempts value to be not one. Contributed by Zhijie Shen. (Revision 1470094)

          Result = FAILURE
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470094
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1378 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1378/ ) YARN-542 . Changed the default global AM max-attempts value to be not one. Contributed by Zhijie Shen. (Revision 1470094) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470094 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1405 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1405/)
          YARN-542. Changed the default global AM max-attempts value to be not one. Contributed by Zhijie Shen. (Revision 1470094)

          Result = SUCCESS
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470094
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1405 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1405/ ) YARN-542 . Changed the default global AM max-attempts value to be not one. Contributed by Zhijie Shen. (Revision 1470094) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470094 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          Hide
          Bikas Saha added a comment -

          I think that the global default should be one because an app attempt can fail due to a large number of user level reasons and should not be retried. The system should retry an attempt upon hardware failure or YARN error and set this default to 1.

          Show
          Bikas Saha added a comment - I think that the global default should be one because an app attempt can fail due to a large number of user level reasons and should not be retried. The system should retry an attempt upon hardware failure or YARN error and set this default to 1.

            People

            • Assignee:
              Zhijie Shen
              Reporter:
              Vinod Kumar Vavilapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development