Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2019

Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.2, 2.6.2, 3.0.0-alpha1
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal exception to crash RM down. As shown in YARN-1924, it could due to RM HA internal bug itself, but not fatal exception. We should retrospect some decision here as HA feature is designed to protect key component but not disturb it.

      1. YARN-2019.patch
        4 kB
        Jian He
      2. YARN-2019.patch
        5 kB
        Jian He
      3. YARN-2019.1-wip.patch
        2 kB
        Tsuyoshi Ozawa

        Issue Links

          Activity

          Hide
          bikassaha Bikas Saha added a comment -

          That was the initial code since there was no HA states then. With HA states, the RM should move into standby state upon store error.

          Show
          bikassaha Bikas Saha added a comment - That was the initial code since there was no HA states then. With HA states, the RM should move into standby state upon store error.
          Hide
          djp Junping Du added a comment -

          The bad news could be: the exception could be repeated on new active RM as ZKRMStateStore is shared. Am I missing anything here?

          Show
          djp Junping Du added a comment - The bad news could be: the exception could be repeated on new active RM as ZKRMStateStore is shared. Am I missing anything here?
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          RMStateStore handles the exceptions in ZKRMStateStore like this:

              try {
                // ZK related operations
                removeRMDTMasterKeyState(delegationKey);
              } catch (Exception e) {
                notifyStoreOperationFailed(e);
              }
          

          If it's fenced, RMFatalEventDispatcher handles the exceptions and RM goes into standby state. However, if STATE_STORE_OP_FAILED occurs, Active RM terminates. After fail-over to standby RM, the exception could be repeated on new active RM. Maybe this is the case Junping Du mentioned. Please correct me if I get wrong.

            @Private
            public static class RMFatalEventDispatcher
                implements EventHandler<RMFatalEvent> {
              @Override
              public void handle(RMFatalEvent event) {
                LOG.fatal("Received a " + RMFatalEvent.class.getName() + " of type " +
                    event.getType().name() + ". Cause:\n" + event.getCause());
          
                if (event.getType() == RMFatalEventType.STATE_STORE_FENCED) {
                  LOG.info("RMStateStore has been fenced");
                  if (rmContext.isHAEnabled()) {
                    try {
                      // Transition to standby and reinit active services
                      LOG.info("Transitioning RM to Standby mode");
                      rm.transitionToStandby(true);
                      return;
                    } catch (Exception e) {
                      LOG.fatal("Failed to transition RM to Standby mode.");
                    }
                  }
                }
          
                ExitUtil.terminate(1, event.getCause());
              }
            }
          
          Show
          ozawa Tsuyoshi Ozawa added a comment - RMStateStore handles the exceptions in ZKRMStateStore like this: try { // ZK related operations removeRMDTMasterKeyState(delegationKey); } catch (Exception e) { notifyStoreOperationFailed(e); } If it's fenced, RMFatalEventDispatcher handles the exceptions and RM goes into standby state. However, if STATE_STORE_OP_FAILED occurs, Active RM terminates. After fail-over to standby RM, the exception could be repeated on new active RM. Maybe this is the case Junping Du mentioned. Please correct me if I get wrong. @Private public static class RMFatalEventDispatcher implements EventHandler<RMFatalEvent> { @Override public void handle(RMFatalEvent event) { LOG.fatal( "Received a " + RMFatalEvent.class.getName() + " of type " + event.getType().name() + ". Cause:\n" + event.getCause()); if (event.getType() == RMFatalEventType.STATE_STORE_FENCED) { LOG.info( "RMStateStore has been fenced" ); if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info( "Transitioning RM to Standby mode" ); rm.transitionToStandby( true ); return ; } catch (Exception e) { LOG.fatal( "Failed to transition RM to Standby mode." ); } } } ExitUtil.terminate(1, event.getCause()); } }
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          This means that all RM can terminates when ZK cannot be accessed from RMs. If we should retry until ZK come up, one solution is handling STATE_STORE_OP_FAILED in RMFatalEventDispatcher and going into standby state. Please see an attached patch .

          Show
          ozawa Tsuyoshi Ozawa added a comment - This means that all RM can terminates when ZK cannot be accessed from RMs. If we should retry until ZK come up, one solution is handling STATE_STORE_OP_FAILED in RMFatalEventDispatcher and going into standby state. Please see an attached patch .
          Hide
          kkambatl Karthik Kambatla (Inactive) added a comment -

          Junping Du - any particular ideas on how this should behave?

          Show
          kkambatl Karthik Kambatla (Inactive) added a comment - Junping Du - any particular ideas on how this should behave?
          Hide
          djp Junping Du added a comment -

          Karthik Kambatla, sorry that I ignored your comments as my email/company changed during that time. My thought on right behave is:
          If any issue in ZK cluster side, although it is distributed and should be more robust but could down due to bug or bad configuration, we can let ActiveRM continue to run as no-HA case. In addition, we should report Admin that the HA is not playing well, and let admin to decide when it is the proper timeline to bring down RM and reconfigure the HA things. Make sense?

          Show
          djp Junping Du added a comment - Karthik Kambatla , sorry that I ignored your comments as my email/company changed during that time. My thought on right behave is: If any issue in ZK cluster side, although it is distributed and should be more robust but could down due to bug or bad configuration, we can let ActiveRM continue to run as no-HA case. In addition, we should report Admin that the HA is not playing well, and let admin to decide when it is the proper timeline to bring down RM and reconfigure the HA things. Make sense?
          Hide
          jianhe Jian He added a comment -

          I agree that we should provide an option to not crash RM if some exception happened in state-store.

          Show
          jianhe Jian He added a comment - I agree that we should provide an option to not crash RM if some exception happened in state-store.
          Hide
          kasha Karthik Kambatla added a comment -

          How about adopting the approach proposed in YARN-3607?

          Show
          kasha Karthik Kambatla added a comment - How about adopting the approach proposed in YARN-3607 ?
          Hide
          djp Junping Du added a comment -

          +1 on general idea of YARN-3607. However, here users may have three options actually when facing error of ZKRMStateStore:
          1. aggressive to fail RM daemon;
          2. conservative to only log these errors without failed RM daemon and any applications;
          3. relative conservative - not failed RM but failed application in some cases (like RM get restarted).
          These choices may hint we may not want to force the policy of handling on all failures into a single configuration, although I agree we should combine/consolidate them as many as possible like what proposed by YARN-3607.
          Particularly in this case, I may prefer to add a separated configuration (may be something like: a boolean value for "yarn.resourcemanager.state-store.exit-on-error" or an enum value for "yarn.resourcemanager.state-store.policy-on-error"?) to allow user to choose when facing RM state store failures. So user got other options for other failure cases.

          Show
          djp Junping Du added a comment - +1 on general idea of YARN-3607 . However, here users may have three options actually when facing error of ZKRMStateStore: 1. aggressive to fail RM daemon; 2. conservative to only log these errors without failed RM daemon and any applications; 3. relative conservative - not failed RM but failed application in some cases (like RM get restarted). These choices may hint we may not want to force the policy of handling on all failures into a single configuration, although I agree we should combine/consolidate them as many as possible like what proposed by YARN-3607 . Particularly in this case, I may prefer to add a separated configuration (may be something like: a boolean value for "yarn.resourcemanager.state-store.exit-on-error" or an enum value for "yarn.resourcemanager.state-store.policy-on-error"?) to allow user to choose when facing RM state store failures. So user got other options for other failure cases.
          Hide
          kasha Karthik Kambatla added a comment -

          We could have two fail-fast configs - one for daemon and one for app/container. If we could do with general fail-fast configs, we should try and avoid adding component-specific configs; otherwise, we ll end up making configuring Yarn even harder.

          Show
          kasha Karthik Kambatla added a comment - We could have two fail-fast configs - one for daemon and one for app/container. If we could do with general fail-fast configs, we should try and avoid adding component-specific configs; otherwise, we ll end up making configuring Yarn even harder.
          Hide
          djp Junping Du added a comment -

          If so, I think we should at least differentiate RM and NM policies - user could be conservative to RM state store failure but be aggressive to NM state store failure. May be using "yarn.resourcemanager.fail-fast" here? Then we can use "yarn.nodemanager.fail-fast" later and may for other daemons (timeline service, etc.).

          Show
          djp Junping Du added a comment - If so, I think we should at least differentiate RM and NM policies - user could be conservative to RM state store failure but be aggressive to NM state store failure. May be using "yarn.resourcemanager.fail-fast" here? Then we can use "yarn.nodemanager.fail-fast" later and may for other daemons (timeline service, etc.).
          Hide
          kasha Karthik Kambatla added a comment -

          Sure. A per-daemon config sounds very reasonable. We could have one each for resourcemanager, nodemanager, timelineserver, app

          Show
          kasha Karthik Kambatla added a comment - Sure. A per-daemon config sounds very reasonable. We could have one each for resourcemanager, nodemanager, timelineserver, app
          Hide
          kasha Karthik Kambatla added a comment -

          And, may be also add a yarn.fail-fast as convenience method that rest would pick up from if not explicitly set.

          Show
          kasha Karthik Kambatla added a comment - And, may be also add a yarn.fail-fast as convenience method that rest would pick up from if not explicitly set.
          Hide
          jianhe Jian He added a comment -

          uploaded a patch which adds the "yarn.resourcemanager.fail-fast" config

          Show
          jianhe Jian He added a comment - uploaded a patch which adds the "yarn.resourcemanager.fail-fast" config
          Hide
          kasha Karthik Kambatla added a comment -

          Comments on the patch:

          1. Instead of having a separate default for all daemons, can all of them default to yarn.fail-fast? The default for yarn.fail-fast could be true?
          2. Should we have convenience methods in YarnConfiguration to fetch the fail-fast value for individual daemons. e.g. shouldRMFailFast()
          Show
          kasha Karthik Kambatla added a comment - Comments on the patch: Instead of having a separate default for all daemons, can all of them default to yarn.fail-fast? The default for yarn.fail-fast could be true? Should we have convenience methods in YarnConfiguration to fetch the fail-fast value for individual daemons. e.g. shouldRMFailFast()
          Hide
          jianhe Jian He added a comment -

          thanks for the review, Karthik !
          Addressed both comments.

          Show
          jianhe Jian He added a comment - thanks for the review, Karthik ! Addressed both comments.
          Hide
          kasha Karthik Kambatla added a comment -

          +1, pending Jenkins.

          Show
          kasha Karthik Kambatla added a comment - +1, pending Jenkins.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 53s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 40s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 49s The applied patch generated 1 new checkstyle issues (total was 211, now 211).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 23s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 4m 33s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 0m 23s Tests passed in hadoop-yarn-api.
          +1 yarn tests 1m 57s Tests passed in hadoop-yarn-common.
          +1 yarn tests 51m 54s Tests passed in hadoop-yarn-server-resourcemanager.
              99m 47s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12746639/YARN-2019.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 06e5dd2
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
          hadoop-yarn-api test log https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/testrun_hadoop-yarn-api.txt
          hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/testrun_hadoop-yarn-common.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8623/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8623/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 53s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 40s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 49s The applied patch generated 1 new checkstyle issues (total was 211, now 211). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 23s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 4m 33s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 0m 23s Tests passed in hadoop-yarn-api. +1 yarn tests 1m 57s Tests passed in hadoop-yarn-common. +1 yarn tests 51m 54s Tests passed in hadoop-yarn-server-resourcemanager.     99m 47s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12746639/YARN-2019.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 06e5dd2 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt hadoop-yarn-api test log https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/testrun_hadoop-yarn-api.txt hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/testrun_hadoop-yarn-common.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8623/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8623/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 44s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 38s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 52s The applied patch generated 1 new checkstyle issues (total was 211, now 211).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 21s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 4m 33s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 0m 22s Tests passed in hadoop-yarn-api.
          +1 yarn tests 1m 57s Tests passed in hadoop-yarn-common.
          +1 yarn tests 51m 55s Tests passed in hadoop-yarn-server-resourcemanager.
              99m 36s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12746651/YARN-2019.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 06e5dd2
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
          hadoop-yarn-api test log https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/testrun_hadoop-yarn-api.txt
          hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/testrun_hadoop-yarn-common.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8624/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8624/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 44s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 38s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 52s The applied patch generated 1 new checkstyle issues (total was 211, now 211). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 21s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 4m 33s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 0m 22s Tests passed in hadoop-yarn-api. +1 yarn tests 1m 57s Tests passed in hadoop-yarn-common. +1 yarn tests 51m 55s Tests passed in hadoop-yarn-server-resourcemanager.     99m 36s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12746651/YARN-2019.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 06e5dd2 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt hadoop-yarn-api test log https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/testrun_hadoop-yarn-api.txt hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/testrun_hadoop-yarn-common.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8624/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8624/console This message was automatically generated.
          Hide
          djp Junping Du added a comment -

          +1. Committing it in.

          Show
          djp Junping Du added a comment - +1. Committing it in.
          Hide
          djp Junping Du added a comment -

          I have commit latest patch to trunk and branch-2. Thanks Jian He for contributing the patch and Karthik Kambatla for review and comment!

          Show
          djp Junping Du added a comment - I have commit latest patch to trunk and branch-2. Thanks Jian He for contributing the patch and Karthik Kambatla for review and comment!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #8204 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8204/)
          YARN-2019. Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #8204 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8204/ ) YARN-2019 . Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #995 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/995/)
          YARN-2019. Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #995 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/995/ ) YARN-2019 . Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #265 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/265/)
          YARN-2019. Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #265 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/265/ ) YARN-2019 . Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2192 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2192/)
          YARN-2019. Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2192 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2192/ ) YARN-2019 . Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #254 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/254/)
          YARN-2019. Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #254 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/254/ ) YARN-2019 . Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #262 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/262/)
          YARN-2019. Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #262 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/262/ ) YARN-2019 . Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2211 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2211/)
          YARN-2019. Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2211 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2211/ ) YARN-2019 . Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore. Contributed by Jian He. (junping_du: rev ee98d6354bbbcd0832d3e539ee097f837e5d0e31) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8410 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8410/)
          Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2)

          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8410 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8410/ ) Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2) hadoop-yarn-project/CHANGES.txt
          Hide
          xgong Xuan Gong added a comment -

          Also committed it into branch-2.7/branch-2.6

          Show
          xgong Xuan Gong added a comment - Also committed it into branch-2.7/branch-2.6
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #358 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/358/)
          Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2)

          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #358 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/358/ ) Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2) hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8411 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8411/)
          YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8411 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8411/ ) YARN-4087 . Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          Hide
          bikassaha Bikas Saha added a comment -

          Sorry for coming in late on this. There would be 2 kinds of state store operations - reads and writes. Writes may be of 2 kinds - critical and non-critical. E.g. saving an application submission is critical. Saving a node information is perhaps not critical. It would affect system correctness is critical write operation errors are allowed to be ignored. We end up with YARN-4118 and other such potential issues. Essentially we are turning a write-ahead log into something that optional. I dont see how the system can make stable reliability guarantees by making the write-ahead log non-fatal.
          On the other hand read errors or non-critical write errors should not block RM progress but do need to be potentially retried. That also does not seem to be addressed in the patch.

          Show
          bikassaha Bikas Saha added a comment - Sorry for coming in late on this. There would be 2 kinds of state store operations - reads and writes. Writes may be of 2 kinds - critical and non-critical. E.g. saving an application submission is critical. Saving a node information is perhaps not critical. It would affect system correctness is critical write operation errors are allowed to be ignored. We end up with YARN-4118 and other such potential issues. Essentially we are turning a write-ahead log into something that optional. I dont see how the system can make stable reliability guarantees by making the write-ahead log non-fatal. On the other hand read errors or non-critical write errors should not block RM progress but do need to be potentially retried. That also does not seem to be addressed in the patch.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #359 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/359/)
          YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #359 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/359/ ) YARN-4087 . Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #1090 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1090/)
          Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2)

          • hadoop-yarn-project/CHANGES.txt
            YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #1090 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1090/ ) Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2) hadoop-yarn-project/CHANGES.txt YARN-4087 . Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2301 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2301/)
          Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2)

          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2301 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2301/ ) Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2) hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #340 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/340/)
          Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2)

          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #340 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/340/ ) Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2) hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #352 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/352/)
          Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2)

          • hadoop-yarn-project/CHANGES.txt
            YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #352 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/352/ ) Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2) hadoop-yarn-project/CHANGES.txt YARN-4087 . Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2279 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2279/)
          Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2)

          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2279 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2279/ ) Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: rev 6a5068970567817af78d6ec2dfe474b09533a8d2) hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2302 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2302/)
          YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2302 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2302/ ) YARN-4087 . Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2280 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2280/)
          YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2280 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2280/ ) YARN-4087 . Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #341 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/341/)
          YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #341 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/341/ ) YARN-4087 . Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Hide
          bikassaha Bikas Saha added a comment -

          Does this now mean that during a failover the new RM could forget about the jobs that failed to get stored by the previous RM?

          Show
          bikassaha Bikas Saha added a comment - Does this now mean that during a failover the new RM could forget about the jobs that failed to get stored by the previous RM?

            People

            • Assignee:
              jianhe Jian He
              Reporter:
              djp Junping Du
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development