Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6061

Add an UncaughtExceptionHandler for critical threads in RM

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      There are several threads in fair scheduler. The thread will quit when there is a runtime exception inside it. We should bring down the RM when that happens. Otherwise, there may be some weird behavior in RM.

      1. YARN-6061.001.patch
        5 kB
        Yufei Gu
      2. YARN-6061.002.patch
        7 kB
        Yufei Gu
      3. YARN-6061.003.patch
        11 kB
        Yufei Gu
      4. YARN-6061.004.patch
        16 kB
        Yufei Gu
      5. YARN-6061.005.patch
        16 kB
        Yufei Gu
      6. YARN-6061.006.patch
        17 kB
        Yufei Gu
      7. YARN-6061.007.patch
        17 kB
        Yufei Gu
      8. YARN-6061.008.patch
        18 kB
        Yufei Gu
      9. YARN-6061.009.patch
        18 kB
        Yufei Gu
      10. YARN-6061.010.patch
        18 kB
        Yufei Gu

        Issue Links

          Activity

          Show
          devaraj.k Devaraj K added a comment - Should not handle this? https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java#L1378
          Hide
          yufeigu Yufei Gu added a comment - - edited

          Yes. This sets the default handler if no specific one is set for the thread. But we need a different handlers here. When YarnUncaughtExceptionHandler got a raw RuntimeException, it just logs an error, doesn't bring down the RM. This is fine for some threads, e.g. threads in a thread pool. But for other threads like update thread and preemption thread in fair scheduler, we should bring down the RM once a RTE is caught since there is no way RM still is running but these critical threads crashed.
          I realize that it should work for all critical threads(critical means we should bring down the RM if the thread crashed). Maybe we should enlarge the scope to RM instead of FS only.

          Karthik Kambatla, what do you think?

          Show
          yufeigu Yufei Gu added a comment - - edited Yes. This sets the default handler if no specific one is set for the thread. But we need a different handlers here. When YarnUncaughtExceptionHandler got a raw RuntimeException, it just logs an error, doesn't bring down the RM. This is fine for some threads, e.g. threads in a thread pool. But for other threads like update thread and preemption thread in fair scheduler, we should bring down the RM once a RTE is caught since there is no way RM still is running but these critical threads crashed. I realize that it should work for all critical threads(critical means we should bring down the RM if the thread crashed). Maybe we should enlarge the scope to RM instead of FS only. Karthik Kambatla , what do you think?
          Hide
          yufeigu Yufei Gu added a comment -

          Had an offline discussion with Karthik Kambatla. We agreed to enlarge the scope to yarn project. A new patch uploaded.

          Show
          yufeigu Yufei Gu added a comment - Had an offline discussion with Karthik Kambatla . We agreed to enlarge the scope to yarn project. A new patch uploaded.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 12m 58s trunk passed
          +1 compile 0m 27s trunk passed
          +1 checkstyle 0m 19s trunk passed
          +1 mvnsite 0m 29s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 0m 56s trunk passed
          +1 javadoc 0m 29s trunk passed
          +1 mvninstall 0m 25s the patch passed
          +1 compile 0m 24s the patch passed
          +1 javac 0m 24s the patch passed
          -0 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 mvnsite 0m 29s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 4s the patch passed
          +1 javadoc 0m 26s the patch passed
          +1 unit 2m 21s hadoop-yarn-common in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          23m 8s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846437/YARN-6061.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e4b5d55dccba 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 91bf504
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14611/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14611/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14611/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 12m 58s trunk passed +1 compile 0m 27s trunk passed +1 checkstyle 0m 19s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 56s trunk passed +1 javadoc 0m 29s trunk passed +1 mvninstall 0m 25s the patch passed +1 compile 0m 24s the patch passed +1 javac 0m 24s the patch passed -0 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 mvnsite 0m 29s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 4s the patch passed +1 javadoc 0m 26s the patch passed +1 unit 2m 21s hadoop-yarn-common in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 23m 8s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846437/YARN-6061.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e4b5d55dccba 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 91bf504 Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14611/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14611/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common Console output https://builds.apache.org/job/PreCommit-YARN-Build/14611/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          kasha Karthik Kambatla added a comment -

          Yufei Gu - thanks for working on this. I must have misunderstood you.

          I am in favor of creating a RM-wide UncaughtExceptionHandler, that creates and sends an RMFatalEvent so the RM can either shutdown or transition-to-standby based on whether HA is enabled. This allows the StandbyRM to become Active and run so long as that also doesn't run into the same uncaught exception.

          Thinking more about this, on receiving a fatal event, the RM should also consult yarn.resourcemanager.failfast to decide whether to shutdown or transition to standby. That is likely another JIRA though.

          Show
          kasha Karthik Kambatla added a comment - Yufei Gu - thanks for working on this. I must have misunderstood you. I am in favor of creating a RM-wide UncaughtExceptionHandler, that creates and sends an RMFatalEvent so the RM can either shutdown or transition-to-standby based on whether HA is enabled. This allows the StandbyRM to become Active and run so long as that also doesn't run into the same uncaught exception. Thinking more about this, on receiving a fatal event, the RM should also consult yarn.resourcemanager.failfast to decide whether to shutdown or transition to standby. That is likely another JIRA though.
          Hide
          yufeigu Yufei Gu added a comment -

          Thanks Karthik Kambatla for the review. Misunderstanding may be on my side. I was trying to create terminate-program version of class YarnUncaughtExceptionHandler in last patch. If we narrow down to RM-wide, I agreed with you, to send a RMFatalEvent seems more graceful. But to do that, we need to modify the handler of RMFatalEvent as well because the current handler just terminates the program as the following code.

            public static class RMFatalEventDispatcher
                implements EventHandler<RMFatalEvent> {
          
              @Override
              public void handle(RMFatalEvent event) {
                LOG.fatal("Received a " + RMFatalEvent.class.getName() + " of type " +
                    event.getType().name() + ". Cause:\n" + event.getCause());
          
                ExitUtil.terminate(1, event.getCause());
              }
            }
          
          Show
          yufeigu Yufei Gu added a comment - Thanks Karthik Kambatla for the review. Misunderstanding may be on my side. I was trying to create terminate-program version of class YarnUncaughtExceptionHandler in last patch. If we narrow down to RM-wide, I agreed with you, to send a RMFatalEvent seems more graceful. But to do that, we need to modify the handler of RMFatalEvent as well because the current handler just terminates the program as the following code. public static class RMFatalEventDispatcher implements EventHandler<RMFatalEvent> { @Override public void handle(RMFatalEvent event) { LOG.fatal( "Received a " + RMFatalEvent.class.getName() + " of type " + event.getType().name() + ". Cause:\n" + event.getCause()); ExitUtil.terminate(1, event.getCause()); } }
          Hide
          yufeigu Yufei Gu added a comment -

          Uploaded patch 002. It create a RM-wide UncaughtExceptionHandler which send a RMFatalEvent to RM.

          Show
          yufeigu Yufei Gu added a comment - Uploaded patch 002. It create a RM-wide UncaughtExceptionHandler which send a RMFatalEvent to RM.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user flyrain opened a pull request:

          https://github.com/apache/hadoop/pull/182

          YARN-6061. Add a customized uncaughtexceptionhandler for critical threads

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/flyrain/hadoop yarn-6061

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/hadoop/pull/182.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #182


          commit fbbc209b1450c7c22b04568f53b716e727df750a
          Author: Yufei Gu <yufei.gu@cloudera.com>
          Date: 2017-01-19T23:50:55Z

          YARN-6061. Add a customized uncaughtexceptionhandler for critical threads.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user flyrain opened a pull request: https://github.com/apache/hadoop/pull/182 YARN-6061 . Add a customized uncaughtexceptionhandler for critical threads You can merge this pull request into a Git repository by running: $ git pull https://github.com/flyrain/hadoop yarn-6061 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/182.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #182 commit fbbc209b1450c7c22b04568f53b716e727df750a Author: Yufei Gu <yufei.gu@cloudera.com> Date: 2017-01-19T23:50:55Z YARN-6061 . Add a customized uncaughtexceptionhandler for critical threads.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 15m 28s trunk passed
          +1 compile 0m 39s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 41s trunk passed
          +1 mvneclipse 0m 20s trunk passed
          +1 findbugs 1m 15s trunk passed
          +1 javadoc 0m 29s trunk passed
          +1 mvninstall 0m 36s the patch passed
          +1 compile 0m 38s the patch passed
          +1 javac 0m 38s the patch passed
          -0 checkstyle 0m 19s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
          +1 mvnsite 0m 37s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 18s the patch passed
          +1 javadoc 0m 21s the patch passed
          -1 unit 41m 55s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          67m 14s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          GITHUB PR https://github.com/apache/hadoop/pull/182
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 76d62bc5f048 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 5d8b80e
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14713/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14713/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14713/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14713/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 15m 28s trunk passed +1 compile 0m 39s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 41s trunk passed +1 mvneclipse 0m 20s trunk passed +1 findbugs 1m 15s trunk passed +1 javadoc 0m 29s trunk passed +1 mvninstall 0m 36s the patch passed +1 compile 0m 38s the patch passed +1 javac 0m 38s the patch passed -0 checkstyle 0m 19s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) +1 mvnsite 0m 37s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 18s the patch passed +1 javadoc 0m 21s the patch passed -1 unit 41m 55s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 67m 14s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 GITHUB PR https://github.com/apache/hadoop/pull/182 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 76d62bc5f048 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 5d8b80e Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14713/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14713/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14713/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14713/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          kasha Karthik Kambatla added a comment -

          Now I know why my perception of reality is different from reality. YARN-2579 has moved transitioning the RM to standby from RMFatalEventDispatcher to RMStateStore to avoid a deadlock in the RM.

          Can we have RMCriticalThreadUncaughtExceptionHandler mimic the same behavior? Maybe, share StandByTransitionThread?

          Show
          kasha Karthik Kambatla added a comment - Now I know why my perception of reality is different from reality. YARN-2579 has moved transitioning the RM to standby from RMFatalEventDispatcher to RMStateStore to avoid a deadlock in the RM. Can we have RMCriticalThreadUncaughtExceptionHandler mimic the same behavior? Maybe, share StandByTransitionThread?
          Hide
          yufeigu Yufei Gu added a comment -

          Digging through YARN-2579, it is not a good idea to do transitionToStandby in RMFatalEventDispatcher because of following call sequences:

          [eventHandlingThread]: RMFatalEventDispatcher handler => transitionToStandby() => ResetDispatch() => serviceStop() => eventHandlingThread.join(). 
          

          Instead we could reuse the StandByTransitionThread in the class RMStateStore to do transition to standby.

          Show
          yufeigu Yufei Gu added a comment - Digging through YARN-2579 , it is not a good idea to do transitionToStandby in RMFatalEventDispatcher because of following call sequences: [eventHandlingThread]: RMFatalEventDispatcher handler => transitionToStandby() => ResetDispatch() => serviceStop() => eventHandlingThread.join(). Instead we could reuse the StandByTransitionThread in the class RMStateStore to do transition to standby.
          Hide
          yufeigu Yufei Gu added a comment -

          Uploaded a new patch to transition RM to standby or shut it down. Doesn't remove RMFatalEvent as YARN-2579 and YARN-2814 suggested. Maybe we could do it in another JIRA.

          Show
          yufeigu Yufei Gu added a comment - Uploaded a new patch to transition RM to standby or shut it down. Doesn't remove RMFatalEvent as YARN-2579 and YARN-2814 suggested. Maybe we could do it in another JIRA.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 25s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 14m 24s trunk passed
          +1 compile 0m 38s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 1m 11s trunk passed
          +1 javadoc 0m 21s trunk passed
          +1 mvninstall 0m 35s the patch passed
          +1 compile 0m 35s the patch passed
          +1 javac 0m 35s the patch passed
          -0 checkstyle 0m 21s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 59 unchanged - 0 fixed = 60 total (was 59)
          +1 mvnsite 0m 37s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 18s the patch passed
          +1 javadoc 0m 21s the patch passed
          -1 unit 39m 51s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          63m 49s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          GITHUB PR https://github.com/apache/hadoop/pull/182
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 7c3051f1378c 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 312b36d
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14787/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14787/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14787/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14787/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 25s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 14m 24s trunk passed +1 compile 0m 38s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 1m 11s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 35s the patch passed +1 compile 0m 35s the patch passed +1 javac 0m 35s the patch passed -0 checkstyle 0m 21s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 59 unchanged - 0 fixed = 60 total (was 59) +1 mvnsite 0m 37s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 18s the patch passed +1 javadoc 0m 21s the patch passed -1 unit 39m 51s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 63m 49s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 GITHUB PR https://github.com/apache/hadoop/pull/182 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 7c3051f1378c 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 312b36d Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14787/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14787/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14787/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14787/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          yufeigu Yufei Gu added a comment -

          Testing these test failures locally, they are unrelated.

          Show
          yufeigu Yufei Gu added a comment - Testing these test failures locally, they are unrelated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98558323

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java —
          @@ -0,0 +1,65 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.hadoop.yarn.server.resourcemanager;
          +
          +import java.lang.Thread.UncaughtExceptionHandler;
          +
          +import org.apache.commons.logging.Log;
          +import org.apache.commons.logging.LogFactory;
          +import org.apache.hadoop.classification.InterfaceAudience.Public;
          +import org.apache.hadoop.classification.InterfaceStability.Evolving;
          +import org.apache.hadoop.yarn.conf.HAUtil;
          +
          +/**
          + * This class either shutdowns

          {@link ResourceManager} or makes
          + * {@link ResourceManager}

          transition to standby state if any uncaught exception
          + * causes critical threads crash. It is intended to be installed by calling
          + *

          {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)}

          + * in the thread entry point or after creation of threads.
          + */
          +@Public
          +@Evolving
          +public class RMCriticalThreadUncaughtExceptionHandler
          + implements UncaughtExceptionHandler {
          + private static final Log LOG = LogFactory.getLog(
          + RMCriticalThreadUncaughtExceptionHandler.class);
          + private RMContext rmContext;
          +
          + public RMCriticalThreadUncaughtExceptionHandler(RMContext rmContext)

          { + this.rmContext = rmContext; + }

          +
          + @Override
          + public void uncaughtException(Thread t, Throwable e) {
          + LOG.fatal("Critical thread " + t + " crashed!", e);
          +
          + if (HAUtil.isHAEnabled(rmContext.getYarnConfiguration())) {
          + new Thread() {
          + @Override
          + public void run() {
          + rmContext.getResourceManager().handleTransitionToStandBy();
          — End diff –

          Are we sure we can call this synchronously? Would it be more appropriate to do the transition to standby in a different thread, like in RMStateStore? If yes, may be we could update RMStateStore to use this too?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98558323 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java — @@ -0,0 +1,65 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.resourcemanager; + +import java.lang.Thread.UncaughtExceptionHandler; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.classification.InterfaceAudience.Public; +import org.apache.hadoop.classification.InterfaceStability.Evolving; +import org.apache.hadoop.yarn.conf.HAUtil; + +/** + * This class either shutdowns {@link ResourceManager} or makes + * {@link ResourceManager} transition to standby state if any uncaught exception + * causes critical threads crash. It is intended to be installed by calling + * {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)} + * in the thread entry point or after creation of threads. + */ +@Public +@Evolving +public class RMCriticalThreadUncaughtExceptionHandler + implements UncaughtExceptionHandler { + private static final Log LOG = LogFactory.getLog( + RMCriticalThreadUncaughtExceptionHandler.class); + private RMContext rmContext; + + public RMCriticalThreadUncaughtExceptionHandler(RMContext rmContext) { + this.rmContext = rmContext; + } + + @Override + public void uncaughtException(Thread t, Throwable e) { + LOG.fatal("Critical thread " + t + " crashed!", e); + + if (HAUtil.isHAEnabled(rmContext.getYarnConfiguration())) { + new Thread() { + @Override + public void run() { + rmContext.getResourceManager().handleTransitionToStandBy(); — End diff – Are we sure we can call this synchronously? Would it be more appropriate to do the transition to standby in a different thread, like in RMStateStore? If yes, may be we could update RMStateStore to use this too?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98564857

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMCriticalThreadUncaughtExceptionHandler.java —
          @@ -0,0 +1,75 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.hadoop.yarn.server.resourcemanager;
          +
          +import org.apache.hadoop.util.ExitUtil;
          +import org.apache.hadoop.yarn.event.AsyncDispatcher;
          +import org.junit.Test;
          +
          +import static org.junit.Assert.assertSame;
          +import static org.mockito.Mockito.spy;
          +import static org.mockito.Mockito.verify;
          +
          +/**
          + * This class is to test

          {@link RMCriticalThreadUncaughtExceptionHandler}.
          + */
          +public class TestRMCriticalThreadUncaughtExceptionHandler {
          + /**
          + * Throw {@link RuntimeException} inside thread and
          + * check {@link RMCriticalThreadUncaughtExceptionHandler}

          instance.
          + *
          + * Used

          {@link ExitUtil}

          class to avoid jvm exit through
          + *

          {@code System.exit(-1)}

          .
          + *
          + * @throws InterruptedException if any
          + */
          + @Test
          + public void testUncaughtExceptionHandlerWithError()
          + throws InterruptedException {
          + ExitUtil.disableSystemExit();
          +
          + // Create a MockRM and start it
          + ResourceManager resourceManager = new MockRM();
          — End diff –

          If we are anyways creating the MockRM, why not start it and have some thread throw an uncaught exception and verify the RM behavior in two cases (two tests?) - exit and transition to standby?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98564857 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMCriticalThreadUncaughtExceptionHandler.java — @@ -0,0 +1,75 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.resourcemanager; + +import org.apache.hadoop.util.ExitUtil; +import org.apache.hadoop.yarn.event.AsyncDispatcher; +import org.junit.Test; + +import static org.junit.Assert.assertSame; +import static org.mockito.Mockito.spy; +import static org.mockito.Mockito.verify; + +/** + * This class is to test {@link RMCriticalThreadUncaughtExceptionHandler}. + */ +public class TestRMCriticalThreadUncaughtExceptionHandler { + /** + * Throw {@link RuntimeException} inside thread and + * check {@link RMCriticalThreadUncaughtExceptionHandler} instance. + * + * Used {@link ExitUtil} class to avoid jvm exit through + * {@code System.exit(-1)} . + * + * @throws InterruptedException if any + */ + @Test + public void testUncaughtExceptionHandlerWithError() + throws InterruptedException { + ExitUtil.disableSystemExit(); + + // Create a MockRM and start it + ResourceManager resourceManager = new MockRM(); — End diff – If we are anyways creating the MockRM, why not start it and have some thread throw an uncaught exception and verify the RM behavior in two cases (two tests?) - exit and transition to standby?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98557903

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java —
          @@ -0,0 +1,65 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.hadoop.yarn.server.resourcemanager;
          +
          +import java.lang.Thread.UncaughtExceptionHandler;
          +
          +import org.apache.commons.logging.Log;
          +import org.apache.commons.logging.LogFactory;
          +import org.apache.hadoop.classification.InterfaceAudience.Public;
          +import org.apache.hadoop.classification.InterfaceStability.Evolving;
          +import org.apache.hadoop.yarn.conf.HAUtil;
          +
          +/**
          + * This class either shutdowns

          {@link ResourceManager} or makes
          + * {@link ResourceManager}

          transition to standby state if any uncaught exception
          + * causes critical threads crash. It is intended to be installed by calling
          + *

          {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)}

          + * in the thread entry point or after creation of threads.
          + */
          +@Public
          +@Evolving
          +public class RMCriticalThreadUncaughtExceptionHandler
          + implements UncaughtExceptionHandler {
          + private static final Log LOG = LogFactory.getLog(
          + RMCriticalThreadUncaughtExceptionHandler.class);
          + private RMContext rmContext;
          +
          + public RMCriticalThreadUncaughtExceptionHandler(RMContext rmContext)

          { + this.rmContext = rmContext; + }

          +
          + @Override
          + public void uncaughtException(Thread t, Throwable e) {
          + LOG.fatal("Critical thread " + t + " crashed!", e);
          — End diff –

          Should we log t.getName() instead of t itself?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98557903 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java — @@ -0,0 +1,65 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.resourcemanager; + +import java.lang.Thread.UncaughtExceptionHandler; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.classification.InterfaceAudience.Public; +import org.apache.hadoop.classification.InterfaceStability.Evolving; +import org.apache.hadoop.yarn.conf.HAUtil; + +/** + * This class either shutdowns {@link ResourceManager} or makes + * {@link ResourceManager} transition to standby state if any uncaught exception + * causes critical threads crash. It is intended to be installed by calling + * {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)} + * in the thread entry point or after creation of threads. + */ +@Public +@Evolving +public class RMCriticalThreadUncaughtExceptionHandler + implements UncaughtExceptionHandler { + private static final Log LOG = LogFactory.getLog( + RMCriticalThreadUncaughtExceptionHandler.class); + private RMContext rmContext; + + public RMCriticalThreadUncaughtExceptionHandler(RMContext rmContext) { + this.rmContext = rmContext; + } + + @Override + public void uncaughtException(Thread t, Throwable e) { + LOG.fatal("Critical thread " + t + " crashed!", e); — End diff – Should we log t.getName() instead of t itself?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98558016

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java —
          @@ -0,0 +1,65 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.hadoop.yarn.server.resourcemanager;
          +
          +import java.lang.Thread.UncaughtExceptionHandler;
          +
          +import org.apache.commons.logging.Log;
          +import org.apache.commons.logging.LogFactory;
          +import org.apache.hadoop.classification.InterfaceAudience.Public;
          +import org.apache.hadoop.classification.InterfaceStability.Evolving;
          +import org.apache.hadoop.yarn.conf.HAUtil;
          +
          +/**
          + * This class either shutdowns

          {@link ResourceManager} or makes
          + * {@link ResourceManager}

          transition to standby state if any uncaught exception
          + * causes critical threads crash. It is intended to be installed by calling
          + *

          {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)}

          + * in the thread entry point or after creation of threads.
          + */
          +@Public
          +@Evolving
          +public class RMCriticalThreadUncaughtExceptionHandler
          + implements UncaughtExceptionHandler {
          + private static final Log LOG = LogFactory.getLog(
          + RMCriticalThreadUncaughtExceptionHandler.class);
          + private RMContext rmContext;
          +
          + public RMCriticalThreadUncaughtExceptionHandler(RMContext rmContext)

          { + this.rmContext = rmContext; + }

          +
          + @Override
          + public void uncaughtException(Thread t, Throwable e) {
          + LOG.fatal("Critical thread " + t + " crashed!", e);
          +
          + if (HAUtil.isHAEnabled(rmContext.getYarnConfiguration())) {
          + new Thread() {
          + @Override
          + public void run()

          { + rmContext.getResourceManager().handleTransitionToStandBy(); + }

          + }.start();
          + } else

          { + rmContext.getDispatcher().getEventHandler().handle( + new RMFatalEvent(RMFatalEventType.CRITICAL_THREAD_CRASH, + new Exception(e))); + }

          + }
          +}
          — End diff –

          New line at the end of the file?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98558016 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java — @@ -0,0 +1,65 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.resourcemanager; + +import java.lang.Thread.UncaughtExceptionHandler; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.classification.InterfaceAudience.Public; +import org.apache.hadoop.classification.InterfaceStability.Evolving; +import org.apache.hadoop.yarn.conf.HAUtil; + +/** + * This class either shutdowns {@link ResourceManager} or makes + * {@link ResourceManager} transition to standby state if any uncaught exception + * causes critical threads crash. It is intended to be installed by calling + * {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)} + * in the thread entry point or after creation of threads. + */ +@Public +@Evolving +public class RMCriticalThreadUncaughtExceptionHandler + implements UncaughtExceptionHandler { + private static final Log LOG = LogFactory.getLog( + RMCriticalThreadUncaughtExceptionHandler.class); + private RMContext rmContext; + + public RMCriticalThreadUncaughtExceptionHandler(RMContext rmContext) { + this.rmContext = rmContext; + } + + @Override + public void uncaughtException(Thread t, Throwable e) { + LOG.fatal("Critical thread " + t + " crashed!", e); + + if (HAUtil.isHAEnabled(rmContext.getYarnConfiguration())) { + new Thread() { + @Override + public void run() { + rmContext.getResourceManager().handleTransitionToStandBy(); + } + }.start(); + } else { + rmContext.getDispatcher().getEventHandler().handle( + new RMFatalEvent(RMFatalEventType.CRITICAL_THREAD_CRASH, + new Exception(e))); + } + } +} — End diff – New line at the end of the file?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98558451

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMCriticalThreadUncaughtExceptionHandler.java —
          @@ -0,0 +1,75 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.hadoop.yarn.server.resourcemanager;
          +
          +import org.apache.hadoop.util.ExitUtil;
          +import org.apache.hadoop.yarn.event.AsyncDispatcher;
          +import org.junit.Test;
          +
          +import static org.junit.Assert.assertSame;
          +import static org.mockito.Mockito.spy;
          +import static org.mockito.Mockito.verify;
          +
          +/**
          + * This class is to test

          {@link RMCriticalThreadUncaughtExceptionHandler}

          .
          — End diff –

          Not sure if this javadoc is required. This is a convention we generally follow.

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98558451 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMCriticalThreadUncaughtExceptionHandler.java — @@ -0,0 +1,75 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.resourcemanager; + +import org.apache.hadoop.util.ExitUtil; +import org.apache.hadoop.yarn.event.AsyncDispatcher; +import org.junit.Test; + +import static org.junit.Assert.assertSame; +import static org.mockito.Mockito.spy; +import static org.mockito.Mockito.verify; + +/** + * This class is to test {@link RMCriticalThreadUncaughtExceptionHandler} . — End diff – Not sure if this javadoc is required. This is a convention we generally follow.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98580210

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java —
          @@ -0,0 +1,65 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.hadoop.yarn.server.resourcemanager;
          +
          +import java.lang.Thread.UncaughtExceptionHandler;
          +
          +import org.apache.commons.logging.Log;
          +import org.apache.commons.logging.LogFactory;
          +import org.apache.hadoop.classification.InterfaceAudience.Public;
          +import org.apache.hadoop.classification.InterfaceStability.Evolving;
          +import org.apache.hadoop.yarn.conf.HAUtil;
          +
          +/**
          + * This class either shutdowns

          {@link ResourceManager} or makes
          + * {@link ResourceManager}

          transition to standby state if any uncaught exception
          + * causes critical threads crash. It is intended to be installed by calling
          + *

          {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)}

          + * in the thread entry point or after creation of threads.
          + */
          +@Public
          +@Evolving
          +public class RMCriticalThreadUncaughtExceptionHandler
          + implements UncaughtExceptionHandler {
          + private static final Log LOG = LogFactory.getLog(
          + RMCriticalThreadUncaughtExceptionHandler.class);
          + private RMContext rmContext;
          +
          + public RMCriticalThreadUncaughtExceptionHandler(RMContext rmContext)

          { + this.rmContext = rmContext; + }

          +
          + @Override
          + public void uncaughtException(Thread t, Throwable e) {
          + LOG.fatal("Critical thread " + t + " crashed!", e);
          +
          + if (HAUtil.isHAEnabled(rmContext.getYarnConfiguration())) {
          + new Thread() {
          + @Override
          + public void run() {
          + rmContext.getResourceManager().handleTransitionToStandBy();
          — End diff –

          My bad. I see this is being done in a new thread.

          That said, may be, we could consider abstracting this out so this handler and the RMStateStore just call that abstraction?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98580210 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java — @@ -0,0 +1,65 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.resourcemanager; + +import java.lang.Thread.UncaughtExceptionHandler; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.classification.InterfaceAudience.Public; +import org.apache.hadoop.classification.InterfaceStability.Evolving; +import org.apache.hadoop.yarn.conf.HAUtil; + +/** + * This class either shutdowns {@link ResourceManager} or makes + * {@link ResourceManager} transition to standby state if any uncaught exception + * causes critical threads crash. It is intended to be installed by calling + * {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)} + * in the thread entry point or after creation of threads. + */ +@Public +@Evolving +public class RMCriticalThreadUncaughtExceptionHandler + implements UncaughtExceptionHandler { + private static final Log LOG = LogFactory.getLog( + RMCriticalThreadUncaughtExceptionHandler.class); + private RMContext rmContext; + + public RMCriticalThreadUncaughtExceptionHandler(RMContext rmContext) { + this.rmContext = rmContext; + } + + @Override + public void uncaughtException(Thread t, Throwable e) { + LOG.fatal("Critical thread " + t + " crashed!", e); + + if (HAUtil.isHAEnabled(rmContext.getYarnConfiguration())) { + new Thread() { + @Override + public void run() { + rmContext.getResourceManager().handleTransitionToStandBy(); — End diff – My bad. I see this is being done in a new thread. That said, may be, we could consider abstracting this out so this handler and the RMStateStore just call that abstraction?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user flyrain commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98765429

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMCriticalThreadUncaughtExceptionHandler.java —
          @@ -0,0 +1,75 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.hadoop.yarn.server.resourcemanager;
          +
          +import org.apache.hadoop.util.ExitUtil;
          +import org.apache.hadoop.yarn.event.AsyncDispatcher;
          +import org.junit.Test;
          +
          +import static org.junit.Assert.assertSame;
          +import static org.mockito.Mockito.spy;
          +import static org.mockito.Mockito.verify;
          +
          +/**
          + * This class is to test

          {@link RMCriticalThreadUncaughtExceptionHandler}.
          + */
          +public class TestRMCriticalThreadUncaughtExceptionHandler {
          + /**
          + * Throw {@link RuntimeException} inside thread and
          + * check {@link RMCriticalThreadUncaughtExceptionHandler}

          instance.
          + *
          + * Used

          {@link ExitUtil}

          class to avoid jvm exit through
          + *

          {@code System.exit(-1)}

          .
          + *
          + * @throws InterruptedException if any
          + */
          + @Test
          + public void testUncaughtExceptionHandlerWithError()
          + throws InterruptedException {
          + ExitUtil.disableSystemExit();
          +
          + // Create a MockRM and start it
          + ResourceManager resourceManager = new MockRM();
          — End diff –

          Add one unit test to check if RM transition to standby.

          Show
          githubbot ASF GitHub Bot added a comment - Github user flyrain commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98765429 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMCriticalThreadUncaughtExceptionHandler.java — @@ -0,0 +1,75 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.resourcemanager; + +import org.apache.hadoop.util.ExitUtil; +import org.apache.hadoop.yarn.event.AsyncDispatcher; +import org.junit.Test; + +import static org.junit.Assert.assertSame; +import static org.mockito.Mockito.spy; +import static org.mockito.Mockito.verify; + +/** + * This class is to test {@link RMCriticalThreadUncaughtExceptionHandler}. + */ +public class TestRMCriticalThreadUncaughtExceptionHandler { + /** + * Throw {@link RuntimeException} inside thread and + * check {@link RMCriticalThreadUncaughtExceptionHandler} instance. + * + * Used {@link ExitUtil} class to avoid jvm exit through + * {@code System.exit(-1)} . + * + * @throws InterruptedException if any + */ + @Test + public void testUncaughtExceptionHandlerWithError() + throws InterruptedException { + ExitUtil.disableSystemExit(); + + // Create a MockRM and start it + ResourceManager resourceManager = new MockRM(); — End diff – Add one unit test to check if RM transition to standby.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 9s Maven dependency ordering for branch
          +1 mvninstall 13m 46s trunk passed
          +1 compile 8m 37s trunk passed
          +1 checkstyle 0m 45s trunk passed
          +1 mvnsite 1m 13s trunk passed
          +1 mvneclipse 0m 41s trunk passed
          +1 findbugs 1m 49s trunk passed
          +1 javadoc 0m 53s trunk passed
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 0m 58s the patch passed
          +1 compile 6m 28s the patch passed
          +1 javac 6m 28s the patch passed
          -0 checkstyle 0m 46s hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 117 unchanged - 0 fixed = 121 total (was 117)
          +1 mvnsite 1m 5s the patch passed
          +1 mvneclipse 0m 37s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 59s the patch passed
          +1 javadoc 0m 51s the patch passed
          -1 unit 43m 49s hadoop-yarn-server-resourcemanager in the patch failed.
          -1 unit 18m 5s hadoop-yarn-client in the patch failed.
          +1 asflicense 0m 34s The patch does not generate ASF License warnings.
          112m 2s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
            hadoop.yarn.client.api.impl.TestDistributedScheduling



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          GITHUB PR https://github.com/apache/hadoop/pull/182
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 2e0e418716a3 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 258991d
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14801/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14801/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14801/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14801/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14801/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 9s Maven dependency ordering for branch +1 mvninstall 13m 46s trunk passed +1 compile 8m 37s trunk passed +1 checkstyle 0m 45s trunk passed +1 mvnsite 1m 13s trunk passed +1 mvneclipse 0m 41s trunk passed +1 findbugs 1m 49s trunk passed +1 javadoc 0m 53s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 0m 58s the patch passed +1 compile 6m 28s the patch passed +1 javac 6m 28s the patch passed -0 checkstyle 0m 46s hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 117 unchanged - 0 fixed = 121 total (was 117) +1 mvnsite 1m 5s the patch passed +1 mvneclipse 0m 37s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 59s the patch passed +1 javadoc 0m 51s the patch passed -1 unit 43m 49s hadoop-yarn-server-resourcemanager in the patch failed. -1 unit 18m 5s hadoop-yarn-client in the patch failed. +1 asflicense 0m 34s The patch does not generate ASF License warnings. 112m 2s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore   hadoop.yarn.client.api.impl.TestDistributedScheduling Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 GITHUB PR https://github.com/apache/hadoop/pull/182 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2e0e418716a3 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 258991d Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14801/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14801/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14801/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14801/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/14801/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98808465

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java —
          @@ -349,4 +356,94 @@ static String getRefreshURL(String url) {
          }
          return redirectUrl;
          }
          +
          + /**
          + * Throw

          {@link RuntimeException}

          inside a thread of
          + *

          {@link ResourceManager} with HA enabled and check if the
          + * {@link ResourceManager}

          is transited to standby state.
          + *
          + * @throws InterruptedException if any
          + */
          + @Test
          + public void testUncaughtExceptionHandlerWithHAEnabled()
          — End diff –

          Nice tests!

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98808465 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java — @@ -349,4 +356,94 @@ static String getRefreshURL(String url) { } return redirectUrl; } + + /** + * Throw {@link RuntimeException} inside a thread of + * {@link ResourceManager} with HA enabled and check if the + * {@link ResourceManager} is transited to standby state. + * + * @throws InterruptedException if any + */ + @Test + public void testUncaughtExceptionHandlerWithHAEnabled() — End diff – Nice tests!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98808165

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java —
          @@ -819,21 +820,29 @@ public void handle(RMFatalEvent event) {
          }
          }

          • public void handleTransitionToStandBy() {
          • if (rmContext.isHAEnabled()) {
          • try {
          • // Transition to standby and reinit active services
          • LOG.info("Transitioning RM to Standby mode");
          • transitionToStandby(true);
          • EmbeddedElector elector = rmContext.getLeaderElectorService();
          • if (elector != null) {
          • elector.rejoinElection();
            + /**
            + * Transition to standby in a new thread.
            + */
            + public void handleTransitionToStandByInNewThread() {
            + new Thread() {
              • End diff –

          Instead of using an anonymous class, can we define this as a separate Thread and name it for easier debugging?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98808165 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java — @@ -819,21 +820,29 @@ public void handle(RMFatalEvent event) { } } public void handleTransitionToStandBy() { if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info("Transitioning RM to Standby mode"); transitionToStandby(true); EmbeddedElector elector = rmContext.getLeaderElectorService(); if (elector != null) { elector.rejoinElection(); + /** + * Transition to standby in a new thread. + */ + public void handleTransitionToStandByInNewThread() { + new Thread() { End diff – Instead of using an anonymous class, can we define this as a separate Thread and name it for easier debugging?
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 45s Maven dependency ordering for branch
          +1 mvninstall 13m 21s trunk passed
          +1 compile 7m 53s trunk passed
          +1 checkstyle 0m 49s trunk passed
          +1 mvnsite 1m 4s trunk passed
          +1 mvneclipse 0m 39s trunk passed
          +1 findbugs 1m 38s trunk passed
          +1 javadoc 0m 44s trunk passed
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 0m 53s the patch passed
          +1 compile 6m 25s the patch passed
          +1 javac 6m 25s the patch passed
          -0 checkstyle 0m 44s hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 117 unchanged - 0 fixed = 121 total (was 117)
          +1 mvnsite 1m 7s the patch passed
          +1 mvneclipse 0m 37s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 50s the patch passed
          +1 javadoc 0m 44s the patch passed
          +1 unit 39m 25s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 unit 16m 53s hadoop-yarn-client in the patch passed.
          +1 asflicense 0m 31s The patch does not generate ASF License warnings.
          105m 0s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          GITHUB PR https://github.com/apache/hadoop/pull/182
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux d544b15ba7d0 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 3e06475
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14804/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14804/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14804/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 45s Maven dependency ordering for branch +1 mvninstall 13m 21s trunk passed +1 compile 7m 53s trunk passed +1 checkstyle 0m 49s trunk passed +1 mvnsite 1m 4s trunk passed +1 mvneclipse 0m 39s trunk passed +1 findbugs 1m 38s trunk passed +1 javadoc 0m 44s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 0m 53s the patch passed +1 compile 6m 25s the patch passed +1 javac 6m 25s the patch passed -0 checkstyle 0m 44s hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 117 unchanged - 0 fixed = 121 total (was 117) +1 mvnsite 1m 7s the patch passed +1 mvneclipse 0m 37s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 50s the patch passed +1 javadoc 0m 44s the patch passed +1 unit 39m 25s hadoop-yarn-server-resourcemanager in the patch passed. +1 unit 16m 53s hadoop-yarn-client in the patch passed. +1 asflicense 0m 31s The patch does not generate ASF License warnings. 105m 0s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 GITHUB PR https://github.com/apache/hadoop/pull/182 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d544b15ba7d0 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3e06475 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14804/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14804/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/14804/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98941743

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java —
          @@ -824,25 +824,29 @@ public void handle(RMFatalEvent event) {

          • Transition to standby in a new thread.
            */
            public void handleTransitionToStandByInNewThread() {
          • new Thread() {
          • @Override
          • public void run() {
          • if (rmContext.isHAEnabled()) {
          • try {
          • // Transition to standby and reinit active services
          • LOG.info("Transitioning RM to Standby mode");
          • transitionToStandby(true);
          • EmbeddedElector elector = rmContext.getLeaderElectorService();
          • if (elector != null) { - elector.rejoinElection(); - }
          • } catch (Exception e) {
          • LOG.fatal("Failed to transition RM to Standby mode.", e);
          • ExitUtil.terminate(1, e);
            + Thread standByTransitionThread = new Thread(new StandByTransitionThread());
              • End diff –

          Sorry for not identifying this earlier. We should make this thread-safe in case this is triggered by two critical threads failing at the same time.

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98941743 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java — @@ -824,25 +824,29 @@ public void handle(RMFatalEvent event) { Transition to standby in a new thread. */ public void handleTransitionToStandByInNewThread() { new Thread() { @Override public void run() { if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info("Transitioning RM to Standby mode"); transitionToStandby(true); EmbeddedElector elector = rmContext.getLeaderElectorService(); if (elector != null) { - elector.rejoinElection(); - } } catch (Exception e) { LOG.fatal("Failed to transition RM to Standby mode.", e); ExitUtil.terminate(1, e); + Thread standByTransitionThread = new Thread(new StandByTransitionThread()); End diff – Sorry for not identifying this earlier. We should make this thread-safe in case this is triggered by two critical threads failing at the same time.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98942179

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java —
          @@ -824,25 +824,29 @@ public void handle(RMFatalEvent event) {

          • Transition to standby in a new thread.
            */
            public void handleTransitionToStandByInNewThread() {
          • new Thread() {
          • @Override
          • public void run() {
          • if (rmContext.isHAEnabled()) {
          • try {
          • // Transition to standby and reinit active services
          • LOG.info("Transitioning RM to Standby mode");
          • transitionToStandby(true);
          • EmbeddedElector elector = rmContext.getLeaderElectorService();
          • if (elector != null) { - elector.rejoinElection(); - }
          • } catch (Exception e) {
          • LOG.fatal("Failed to transition RM to Standby mode.", e);
          • ExitUtil.terminate(1, e);
            + Thread standByTransitionThread = new Thread(new StandByTransitionThread());
              • End diff –

          Also, would it make sense to create an instance of the Runnable on transition to active, and start a new thread on a need-to basis. If all threads use a single instance of the Runnable, may be it is easier to coordinate?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98942179 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java — @@ -824,25 +824,29 @@ public void handle(RMFatalEvent event) { Transition to standby in a new thread. */ public void handleTransitionToStandByInNewThread() { new Thread() { @Override public void run() { if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info("Transitioning RM to Standby mode"); transitionToStandby(true); EmbeddedElector elector = rmContext.getLeaderElectorService(); if (elector != null) { - elector.rejoinElection(); - } } catch (Exception e) { LOG.fatal("Failed to transition RM to Standby mode.", e); ExitUtil.terminate(1, e); + Thread standByTransitionThread = new Thread(new StandByTransitionThread()); End diff – Also, would it make sense to create an instance of the Runnable on transition to active, and start a new thread on a need-to basis. If all threads use a single instance of the Runnable, may be it is easier to coordinate?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r98942361

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java —
          @@ -824,25 +824,29 @@ public void handle(RMFatalEvent event) {

          • Transition to standby in a new thread.
            */
            public void handleTransitionToStandByInNewThread() {
          • new Thread() {
          • @Override
          • public void run() {
          • if (rmContext.isHAEnabled()) {
          • try {
          • // Transition to standby and reinit active services
          • LOG.info("Transitioning RM to Standby mode");
          • transitionToStandby(true);
          • EmbeddedElector elector = rmContext.getLeaderElectorService();
          • if (elector != null) { - elector.rejoinElection(); - }
          • } catch (Exception e) { - LOG.fatal("Failed to transition RM to Standby mode.", e); - ExitUtil.terminate(1, e); + Thread standByTransitionThread = new Thread(new StandByTransitionThread()); + standByTransitionThread.setName("StandByTransitionThread Handler"); + standByTransitionThread.start(); + }

            +
            + private class StandByTransitionThread implements Runnable {

              • End diff –

          Naming the Runnable a Thread sounds confusing. Can we change it to TransitionToStandbyRunnable or some such?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r98942361 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java — @@ -824,25 +824,29 @@ public void handle(RMFatalEvent event) { Transition to standby in a new thread. */ public void handleTransitionToStandByInNewThread() { new Thread() { @Override public void run() { if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info("Transitioning RM to Standby mode"); transitionToStandby(true); EmbeddedElector elector = rmContext.getLeaderElectorService(); if (elector != null) { - elector.rejoinElection(); - } } catch (Exception e) { - LOG.fatal("Failed to transition RM to Standby mode.", e); - ExitUtil.terminate(1, e); + Thread standByTransitionThread = new Thread(new StandByTransitionThread()); + standByTransitionThread.setName("StandByTransitionThread Handler"); + standByTransitionThread.start(); + } + + private class StandByTransitionThread implements Runnable { End diff – Naming the Runnable a Thread sounds confusing. Can we change it to TransitionToStandbyRunnable or some such?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 26s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 56s Maven dependency ordering for branch
          +1 mvninstall 14m 18s trunk passed
          +1 compile 8m 28s trunk passed
          +1 checkstyle 0m 48s trunk passed
          +1 mvnsite 1m 11s trunk passed
          +1 mvneclipse 0m 41s trunk passed
          +1 findbugs 1m 53s trunk passed
          +1 javadoc 0m 49s trunk passed
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 0m 54s the patch passed
          +1 compile 6m 53s the patch passed
          +1 javac 6m 53s the patch passed
          -0 checkstyle 0m 47s hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 117 unchanged - 0 fixed = 121 total (was 117)
          +1 mvnsite 1m 7s the patch passed
          +1 mvneclipse 0m 39s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 findbugs 1m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 0m 45s the patch passed
          +1 unit 40m 13s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 unit 16m 48s hadoop-yarn-client in the patch passed.
          +1 asflicense 0m 38s The patch does not generate ASF License warnings.
          108m 54s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
            Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.standByTransitionRunnable; locked 50% of time Unsynchronized access at ResourceManager.java:50% of time Unsynchronized access at ResourceManager.java:[line 240]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          GITHUB PR https://github.com/apache/hadoop/pull/182
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ce9b124469f5 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 6aa09dc
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14807/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/14807/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14807/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14807/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 26s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 56s Maven dependency ordering for branch +1 mvninstall 14m 18s trunk passed +1 compile 8m 28s trunk passed +1 checkstyle 0m 48s trunk passed +1 mvnsite 1m 11s trunk passed +1 mvneclipse 0m 41s trunk passed +1 findbugs 1m 53s trunk passed +1 javadoc 0m 49s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 0m 54s the patch passed +1 compile 6m 53s the patch passed +1 javac 6m 53s the patch passed -0 checkstyle 0m 47s hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 117 unchanged - 0 fixed = 121 total (was 117) +1 mvnsite 1m 7s the patch passed +1 mvneclipse 0m 39s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 1m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 0m 45s the patch passed +1 unit 40m 13s hadoop-yarn-server-resourcemanager in the patch passed. +1 unit 16m 48s hadoop-yarn-client in the patch passed. +1 asflicense 0m 38s The patch does not generate ASF License warnings. 108m 54s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager   Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.standByTransitionRunnable; locked 50% of time Unsynchronized access at ResourceManager.java:50% of time Unsynchronized access at ResourceManager.java: [line 240] Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 GITHUB PR https://github.com/apache/hadoop/pull/182 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ce9b124469f5 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6aa09dc Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14807/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/14807/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14807/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/14807/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r99949838

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java —
          @@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) {
          }
          }

          • public void handleTransitionToStandBy() {
          • if (rmContext.isHAEnabled()) {
          • try {
          • // Transition to standby and reinit active services
          • LOG.info("Transitioning RM to Standby mode");
          • transitionToStandby(true);
          • EmbeddedElector elector = rmContext.getLeaderElectorService();
          • if (elector != null) {
          • elector.rejoinElection();
            + /**
            + * Transition to standby in a new thread.
            + */
            + public void handleTransitionToStandByInNewThread() { + Thread standByTransitionThread = + new Thread(activeServices.standByTransitionRunnable); + standByTransitionThread.setName("StandByTransitionThread"); + standByTransitionThread.start(); + }

            +
            + private class StandByTransitionRunnable implements Runnable {
            + private AtomicBoolean hasRun = new AtomicBoolean(false);

              • End diff –

          Maybe, rename this to hasAlreadyRun? And, again add some javadoc here too?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r99949838 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java — @@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) { } } public void handleTransitionToStandBy() { if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info("Transitioning RM to Standby mode"); transitionToStandby(true); EmbeddedElector elector = rmContext.getLeaderElectorService(); if (elector != null) { elector.rejoinElection(); + /** + * Transition to standby in a new thread. + */ + public void handleTransitionToStandByInNewThread() { + Thread standByTransitionThread = + new Thread(activeServices.standByTransitionRunnable); + standByTransitionThread.setName("StandByTransitionThread"); + standByTransitionThread.start(); + } + + private class StandByTransitionRunnable implements Runnable { + private AtomicBoolean hasRun = new AtomicBoolean(false); End diff – Maybe, rename this to hasAlreadyRun? And, again add some javadoc here too?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r99949771

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java —
          @@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) {
          }
          }

          • public void handleTransitionToStandBy() {
          • if (rmContext.isHAEnabled()) {
          • try {
          • // Transition to standby and reinit active services
          • LOG.info("Transitioning RM to Standby mode");
          • transitionToStandby(true);
          • EmbeddedElector elector = rmContext.getLeaderElectorService();
          • if (elector != null) {
          • elector.rejoinElection();
            + /**
            + * Transition to standby in a new thread.
            + */
            + public void handleTransitionToStandByInNewThread() { + Thread standByTransitionThread = + new Thread(activeServices.standByTransitionRunnable); + standByTransitionThread.setName("StandByTransitionThread"); + standByTransitionThread.start(); + }

            +
            + private class StandByTransitionRunnable implements Runnable {
            + private AtomicBoolean hasRun = new AtomicBoolean(false);
            +
            + @Override
            + public void run() {
            + // Prevent from running again if it has run.

              • End diff –

          Add more detail here: "Run this only once, even if multiple threads end up triggering this simultaneously."

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r99949771 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java — @@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) { } } public void handleTransitionToStandBy() { if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info("Transitioning RM to Standby mode"); transitionToStandby(true); EmbeddedElector elector = rmContext.getLeaderElectorService(); if (elector != null) { elector.rejoinElection(); + /** + * Transition to standby in a new thread. + */ + public void handleTransitionToStandByInNewThread() { + Thread standByTransitionThread = + new Thread(activeServices.standByTransitionRunnable); + standByTransitionThread.setName("StandByTransitionThread"); + standByTransitionThread.start(); + } + + private class StandByTransitionRunnable implements Runnable { + private AtomicBoolean hasRun = new AtomicBoolean(false); + + @Override + public void run() { + // Prevent from running again if it has run. End diff – Add more detail here: "Run this only once, even if multiple threads end up triggering this simultaneously."
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r99948742

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java —
          @@ -0,0 +1,60 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.hadoop.yarn.server.resourcemanager;
          +
          +import java.lang.Thread.UncaughtExceptionHandler;
          +
          +import org.apache.commons.logging.Log;
          +import org.apache.commons.logging.LogFactory;
          +import org.apache.hadoop.classification.InterfaceAudience.Public;
          +import org.apache.hadoop.classification.InterfaceStability.Evolving;
          +import org.apache.hadoop.yarn.conf.HAUtil;
          +
          +/**
          + * This class either shutdowns

          {@link ResourceManager}

          or makes
          — End diff –

          • s/shutdowns/shuts down
          • s/makes RM transition/ transitions the RM
          • s/if any uncaught exception.../if a critical thread throws an uncaught exception.
          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r99948742 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java — @@ -0,0 +1,60 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.resourcemanager; + +import java.lang.Thread.UncaughtExceptionHandler; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.classification.InterfaceAudience.Public; +import org.apache.hadoop.classification.InterfaceStability.Evolving; +import org.apache.hadoop.yarn.conf.HAUtil; + +/** + * This class either shutdowns {@link ResourceManager} or makes — End diff – s/shutdowns/shuts down s/makes RM transition/ transitions the RM s/if any uncaught exception.../if a critical thread throws an uncaught exception.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r99949581

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java —
          @@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) {
          }
          }

          • public void handleTransitionToStandBy() {
          • if (rmContext.isHAEnabled()) {
          • try {
          • // Transition to standby and reinit active services
          • LOG.info("Transitioning RM to Standby mode");
          • transitionToStandby(true);
          • EmbeddedElector elector = rmContext.getLeaderElectorService();
          • if (elector != null) {
          • elector.rejoinElection();
            + /**
            + * Transition to standby in a new thread.
            + */
            + public void handleTransitionToStandByInNewThread() { + Thread standByTransitionThread = + new Thread(activeServices.standByTransitionRunnable); + standByTransitionThread.setName("StandByTransitionThread"); + standByTransitionThread.start(); + }

            +
            + private class StandByTransitionRunnable implements Runnable {

              • End diff –

          Let us add javadoc for this class, and include details on how we use the same runnable.

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r99949581 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java — @@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) { } } public void handleTransitionToStandBy() { if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info("Transitioning RM to Standby mode"); transitionToStandby(true); EmbeddedElector elector = rmContext.getLeaderElectorService(); if (elector != null) { elector.rejoinElection(); + /** + * Transition to standby in a new thread. + */ + public void handleTransitionToStandByInNewThread() { + Thread standByTransitionThread = + new Thread(activeServices.standByTransitionRunnable); + standByTransitionThread.setName("StandByTransitionThread"); + standByTransitionThread.start(); + } + + private class StandByTransitionRunnable implements Runnable { End diff – Let us add javadoc for this class, and include details on how we use the same runnable.
          Hide
          yufeigu Yufei Gu added a comment -

          Patch 008 add some docs for new code.

          Show
          yufeigu Yufei Gu added a comment - Patch 008 add some docs for new code.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 9s Maven dependency ordering for branch
          +1 mvninstall 13m 26s trunk passed
          +1 compile 9m 8s trunk passed
          +1 checkstyle 0m 49s trunk passed
          +1 mvnsite 1m 16s trunk passed
          +1 mvneclipse 0m 44s trunk passed
          +1 findbugs 1m 52s trunk passed
          +1 javadoc 0m 55s trunk passed
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 0m 55s the patch passed
          +1 compile 6m 18s the patch passed
          +1 javac 6m 18s the patch passed
          -0 checkstyle 0m 48s hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 117 unchanged - 1 fixed = 120 total (was 118)
          +1 mvnsite 1m 14s the patch passed
          +1 mvneclipse 0m 39s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 13s the patch passed
          +1 javadoc 0m 49s the patch passed
          -1 unit 41m 38s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 unit 17m 7s hadoop-yarn-client in the patch passed.
          +1 asflicense 0m 30s The patch does not generate ASF License warnings.
          109m 39s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
            hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          GITHUB PR https://github.com/apache/hadoop/pull/182
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 8ba9634254bb 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / eec52e1
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14864/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14864/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14864/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14864/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 9s Maven dependency ordering for branch +1 mvninstall 13m 26s trunk passed +1 compile 9m 8s trunk passed +1 checkstyle 0m 49s trunk passed +1 mvnsite 1m 16s trunk passed +1 mvneclipse 0m 44s trunk passed +1 findbugs 1m 52s trunk passed +1 javadoc 0m 55s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 0m 55s the patch passed +1 compile 6m 18s the patch passed +1 javac 6m 18s the patch passed -0 checkstyle 0m 48s hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 117 unchanged - 1 fixed = 120 total (was 118) +1 mvnsite 1m 14s the patch passed +1 mvneclipse 0m 39s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 13s the patch passed +1 javadoc 0m 49s the patch passed -1 unit 41m 38s hadoop-yarn-server-resourcemanager in the patch failed. +1 unit 17m 7s hadoop-yarn-client in the patch passed. +1 asflicense 0m 30s The patch does not generate ASF License warnings. 109m 39s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate   hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 GITHUB PR https://github.com/apache/hadoop/pull/182 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8ba9634254bb 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / eec52e1 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14864/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14864/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14864/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/14864/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kambatla commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/182#discussion_r100468005

          — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java —
          @@ -0,0 +1,60 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.hadoop.yarn.server.resourcemanager;
          +
          +import java.lang.Thread.UncaughtExceptionHandler;
          +
          +import org.apache.commons.logging.Log;
          +import org.apache.commons.logging.LogFactory;
          +import org.apache.hadoop.classification.InterfaceAudience.Public;
          +import org.apache.hadoop.classification.InterfaceStability.Evolving;
          +import org.apache.hadoop.yarn.conf.HAUtil;
          +
          +/**
          + * This class either shuts down

          {@link ResourceManager} or transitions the
          + * {@link ResourceManager}

          to standby state if a critical thread throws an
          + * uncaught exception. It is intended to be installed by calling
          + *

          {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)}

          + * in the thread entry point or after creation of threads.
          + */
          +@Public
          — End diff –

          We don't need to expose this outside YARN at all. This should be @Private. Let us remove @Evolving altogether.

          Show
          githubbot ASF GitHub Bot added a comment - Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/182#discussion_r100468005 — Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java — @@ -0,0 +1,60 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.resourcemanager; + +import java.lang.Thread.UncaughtExceptionHandler; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.classification.InterfaceAudience.Public; +import org.apache.hadoop.classification.InterfaceStability.Evolving; +import org.apache.hadoop.yarn.conf.HAUtil; + +/** + * This class either shuts down {@link ResourceManager} or transitions the + * {@link ResourceManager} to standby state if a critical thread throws an + * uncaught exception. It is intended to be installed by calling + * {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)} + * in the thread entry point or after creation of threads. + */ +@Public — End diff – We don't need to expose this outside YARN at all. This should be @Private. Let us remove @Evolving altogether.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 25s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 12s Maven dependency ordering for branch
          +1 mvninstall 15m 37s trunk passed
          +1 compile 9m 40s trunk passed
          +1 checkstyle 0m 55s trunk passed
          +1 mvnsite 1m 18s trunk passed
          +1 mvneclipse 0m 45s trunk passed
          +1 findbugs 2m 1s trunk passed
          +1 javadoc 0m 50s trunk passed
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 1m 7s the patch passed
          +1 compile 6m 44s the patch passed
          +1 javac 6m 44s the patch passed
          -0 checkstyle 0m 55s hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 116 unchanged - 1 fixed = 119 total (was 117)
          +1 mvnsite 1m 24s the patch passed
          +1 mvneclipse 0m 41s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 22s the patch passed
          +1 javadoc 0m 48s the patch passed
          -1 unit 43m 21s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 unit 17m 41s hadoop-yarn-client in the patch passed.
          +1 asflicense 0m 32s The patch does not generate ASF License warnings.
          116m 7s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestResourceTrackerService
            hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          GITHUB PR https://github.com/apache/hadoop/pull/182
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 34017eea78ee 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 5b15129
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14878/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14878/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14878/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14878/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 25s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 12s Maven dependency ordering for branch +1 mvninstall 15m 37s trunk passed +1 compile 9m 40s trunk passed +1 checkstyle 0m 55s trunk passed +1 mvnsite 1m 18s trunk passed +1 mvneclipse 0m 45s trunk passed +1 findbugs 2m 1s trunk passed +1 javadoc 0m 50s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 1m 7s the patch passed +1 compile 6m 44s the patch passed +1 javac 6m 44s the patch passed -0 checkstyle 0m 55s hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 116 unchanged - 1 fixed = 119 total (was 117) +1 mvnsite 1m 24s the patch passed +1 mvneclipse 0m 41s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 22s the patch passed +1 javadoc 0m 48s the patch passed -1 unit 43m 21s hadoop-yarn-server-resourcemanager in the patch failed. +1 unit 17m 41s hadoop-yarn-client in the patch passed. +1 asflicense 0m 32s The patch does not generate ASF License warnings. 116m 7s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestResourceTrackerService   hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 GITHUB PR https://github.com/apache/hadoop/pull/182 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 34017eea78ee 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 5b15129 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14878/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14878/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14878/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/14878/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          kasha Karthik Kambatla added a comment -

          Looks like the new test file has been excluded starting patch v4. Can we get that back in? Also, fix up the checkstyle issues, especially the ones regarding line length.

          I closed the PR. Let us just post patches here on the JIRA.

          Show
          kasha Karthik Kambatla added a comment - Looks like the new test file has been excluded starting patch v4. Can we get that back in? Also, fix up the checkstyle issues, especially the ones regarding line length. I closed the PR. Let us just post patches here on the JIRA.
          Hide
          yufeigu Yufei Gu added a comment - - edited

          Tests are moved into class TestRMFailover. No need for a new test file. I'll post a new patch for the line length.

          Show
          yufeigu Yufei Gu added a comment - - edited Tests are moved into class TestRMFailover. No need for a new test file. I'll post a new patch for the line length.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 21m 42s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 7s Maven dependency ordering for branch
          +1 mvninstall 14m 4s trunk passed
          +1 compile 8m 52s trunk passed
          +1 checkstyle 0m 50s trunk passed
          +1 mvnsite 1m 16s trunk passed
          +1 mvneclipse 0m 42s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 0m 48s trunk passed
          0 mvndep 0m 9s Maven dependency ordering for patch
          +1 mvninstall 0m 50s the patch passed
          +1 compile 6m 37s the patch passed
          +1 javac 6m 37s the patch passed
          -0 checkstyle 0m 45s hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 116 unchanged - 1 fixed = 119 total (was 117)
          +1 mvnsite 1m 5s the patch passed
          +1 mvneclipse 0m 36s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 57s the patch passed
          +1 javadoc 0m 47s the patch passed
          +1 unit 44m 33s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 unit 16m 47s hadoop-yarn-client in the patch passed.
          +1 asflicense 0m 35s The patch does not generate ASF License warnings.
          133m 53s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          GITHUB PR https://github.com/apache/hadoop/pull/182
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 3898e8df9b36 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 839b690
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14895/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14895/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14895/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 21m 42s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 7s Maven dependency ordering for branch +1 mvninstall 14m 4s trunk passed +1 compile 8m 52s trunk passed +1 checkstyle 0m 50s trunk passed +1 mvnsite 1m 16s trunk passed +1 mvneclipse 0m 42s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 0m 48s trunk passed 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 0m 50s the patch passed +1 compile 6m 37s the patch passed +1 javac 6m 37s the patch passed -0 checkstyle 0m 45s hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 116 unchanged - 1 fixed = 119 total (was 117) +1 mvnsite 1m 5s the patch passed +1 mvneclipse 0m 36s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 57s the patch passed +1 javadoc 0m 47s the patch passed +1 unit 44m 33s hadoop-yarn-server-resourcemanager in the patch passed. +1 unit 16m 47s hadoop-yarn-client in the patch passed. +1 asflicense 0m 35s The patch does not generate ASF License warnings. 133m 53s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 GITHUB PR https://github.com/apache/hadoop/pull/182 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 3898e8df9b36 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 839b690 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14895/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14895/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/14895/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 9s Maven dependency ordering for branch
          +1 mvninstall 13m 37s trunk passed
          +1 compile 8m 9s trunk passed
          +1 checkstyle 0m 48s trunk passed
          +1 mvnsite 1m 9s trunk passed
          +1 mvneclipse 0m 38s trunk passed
          +1 findbugs 7m 10s trunk passed
          +1 javadoc 4m 19s trunk passed
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 0m 55s the patch passed
          +1 compile 6m 23s the patch passed
          +1 javac 6m 23s the patch passed
          -0 checkstyle 0m 45s hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 116 unchanged - 1 fixed = 119 total (was 117)
          +1 mvnsite 1m 11s the patch passed
          +1 mvneclipse 0m 39s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 2s the patch passed
          +1 javadoc 0m 53s the patch passed
          +1 unit 47m 5s hadoop-yarn-server-resourcemanager in the patch passed.
          -1 unit 23m 2s hadoop-yarn-client in the patch failed.
          +1 asflicense 0m 34s The patch does not generate ASF License warnings.
          130m 20s



          Reason Tests
          Failed junit tests hadoop.yarn.client.api.impl.TestAMRMProxy



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6061
          GITHUB PR https://github.com/apache/hadoop/pull/182
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 8c3c718b353a 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 71c23c9
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14915/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14915/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14915/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14915/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 9s Maven dependency ordering for branch +1 mvninstall 13m 37s trunk passed +1 compile 8m 9s trunk passed +1 checkstyle 0m 48s trunk passed +1 mvnsite 1m 9s trunk passed +1 mvneclipse 0m 38s trunk passed +1 findbugs 7m 10s trunk passed +1 javadoc 4m 19s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 0m 55s the patch passed +1 compile 6m 23s the patch passed +1 javac 6m 23s the patch passed -0 checkstyle 0m 45s hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 116 unchanged - 1 fixed = 119 total (was 117) +1 mvnsite 1m 11s the patch passed +1 mvneclipse 0m 39s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 2s the patch passed +1 javadoc 0m 53s the patch passed +1 unit 47m 5s hadoop-yarn-server-resourcemanager in the patch passed. -1 unit 23m 2s hadoop-yarn-client in the patch failed. +1 asflicense 0m 34s The patch does not generate ASF License warnings. 130m 20s Reason Tests Failed junit tests hadoop.yarn.client.api.impl.TestAMRMProxy Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6061 GITHUB PR https://github.com/apache/hadoop/pull/182 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8c3c718b353a 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 71c23c9 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14915/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14915/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14915/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/14915/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          kasha Karthik Kambatla added a comment -

          +1. Just committed this to trunk and branch-2.

          Thanks Yufei Gu for working on this improvement.

          Show
          kasha Karthik Kambatla added a comment - +1. Just committed this to trunk and branch-2. Thanks Yufei Gu for working on this improvement.
          Hide
          yufeigu Yufei Gu added a comment -

          Yeah! Thanks Karthik Kambatla for review, advice and commit.

          Show
          yufeigu Yufei Gu added a comment - Yeah! Thanks Karthik Kambatla for review, advice and commit.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11248 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11248/)
          YARN-6061. Add an UncaughtExceptionHandler for critical threads in RM. (kasha: rev 652679aa8ad6f9e61b8ed8e2b04b3e0332025e94)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11248 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11248/ ) YARN-6061 . Add an UncaughtExceptionHandler for critical threads in RM. (kasha: rev 652679aa8ad6f9e61b8ed8e2b04b3e0332025e94) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
          Hide
          kasha Karthik Kambatla added a comment -

          Included an extraneous change in the original commit. Just pushed an addendum to remove those changes.

          Show
          kasha Karthik Kambatla added a comment - Included an extraneous change in the original commit. Just pushed an addendum to remove those changes.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11249 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11249/)
          YARN-6061. Addendum. Remove extraneous change. (kasha: rev 353a9b2d9165a221491395edbadf8acc3a39990b)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11249 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11249/ ) YARN-6061 . Addendum. Remove extraneous change. (kasha: rev 353a9b2d9165a221491395edbadf8acc3a39990b) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user flyrain commented on the issue:

          https://github.com/apache/hadoop/pull/182

          Committed

          Show
          githubbot ASF GitHub Bot added a comment - Github user flyrain commented on the issue: https://github.com/apache/hadoop/pull/182 Committed
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user flyrain closed the pull request at:

          https://github.com/apache/hadoop/pull/182

          Show
          githubbot ASF GitHub Bot added a comment - Github user flyrain closed the pull request at: https://github.com/apache/hadoop/pull/182

            People

            • Assignee:
              yufeigu Yufei Gu
              Reporter:
              yufeigu Yufei Gu
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development