Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3585

NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      With NM recovery enabled, after decommission, nodemanager log show stop but process cannot end.
      non daemon thread:

      "DestroyJavaVM" prio=10 tid=0x00007f3460011800 nid=0x29ec waiting on condition [0x0000000000000000]
      "leveldb" prio=10 tid=0x00007f3354001800 nid=0x2a97 runnable [0x0000000000000000]
      "VM Thread" prio=10 tid=0x00007f3460167000 nid=0x29f8 runnable 
      "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f3460020000 nid=0x29ed runnable 
      "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f3460022000 nid=0x29ee runnable 
      "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f3460024000 nid=0x29ef runnable 
      "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f3460025800 nid=0x29f0 runnable 
      "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x00007f3460027800 nid=0x29f1 runnable 
      "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x00007f3460029000 nid=0x29f2 runnable 
      "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x00007f346002b000 nid=0x29f3 runnable 
      "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x00007f346002d000 nid=0x29f4 runnable 
      "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f3460120800 nid=0x29f7 runnable 
      "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x00007f346011c800 nid=0x29f5 runnable 
      "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x00007f346011e800 nid=0x29f6 runnable 
      "VM Periodic Task Thread" prio=10 tid=0x00007f346019f800 nid=0x2a01 waiting on condition 
      

      and jni leveldb thread stack

      Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
      #0  0x0000003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00007f33dfce2a3b in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*) () from /tmp/libleveldbjni-64-1-6922178968300745716.8
      #2  0x0000003d83407851 in start_thread () from /lib64/libpthread.so.0
      #3  0x0000003d830e811d in clone () from /lib64/libc.so.6
      
      1. 0001-YARN-3585.patch
        2 kB
        Rohith Sharma K S
      2. YARN-3585.patch
        4 kB
        Rohith Sharma K S

        Issue Links

          Activity

          Hide
          devaraj.k Devaraj K added a comment -

          Thanks Peng Zhang for reporting this issue. I have tried it to reproduce but I don’t think it is a direct scenario to occur.

          Could you give the steps to reproduce this issue?

          Show
          devaraj.k Devaraj K added a comment - Thanks Peng Zhang for reporting this issue. I have tried it to reproduce but I don’t think it is a direct scenario to occur. Could you give the steps to reproduce this issue?
          Hide
          peng.zhang Peng Zhang added a comment -

          As YARN-3640, Rohith has encountered the same problem. And we all see leveldb thread in thread stack.
          I think it's probably related with NM recovery. Decommission is not the key matter.

          Devaraj K Do you enable NM recovery in your env?

          Show
          peng.zhang Peng Zhang added a comment - As YARN-3640 , Rohith has encountered the same problem. And we all see leveldb thread in thread stack. I think it's probably related with NM recovery. Decommission is not the key matter. Devaraj K Do you enable NM recovery in your env?
          Hide
          devaraj.k Devaraj K added a comment -

          Thanks for reply. I have enabled NM recovery in my env.

          Show
          devaraj.k Devaraj K added a comment - Thanks for reply. I have enabled NM recovery in my env.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Marking it as critical for 2.7.1 whichever way we go..

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Marking it as critical for 2.7.1 whichever way we go..
          Hide
          jlowe Jason Lowe added a comment -

          This is very likely a case where the leveldb state store was not closed properly on shutdown. That was probably triggered by another exception that occurred during shutdown that short-circuited the shutdown of other services (like the state store). See YARN-3641.

          Could you check the NM logs for the case where it hung and see if another exception was logged during shutdown that may explain how the leveldb store failed to close?

          Show
          jlowe Jason Lowe added a comment - This is very likely a case where the leveldb state store was not closed properly on shutdown. That was probably triggered by another exception that occurred during shutdown that short-circuited the shutdown of other services (like the state store). See YARN-3641 . Could you check the NM logs for the case where it hung and see if another exception was logged during shutdown that may explain how the leveldb store failed to close?
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Could you check the NM logs for the case where it hung and see if another exception was logged during shutdown that may explain how the leveldb store failed to close?

          I have attached the logs of NodeManagerin the YARN-3640. Does it help? Yes, during shutdown there is another exception i.e ConnectionException thrown from NodeStatusUpdatorImpl thread. But I did not get much Idea how this exception effecting NodeManger services stop.

          Show
          rohithsharma Rohith Sharma K S added a comment - Could you check the NM logs for the case where it hung and see if another exception was logged during shutdown that may explain how the leveldb store failed to close? I have attached the logs of NodeManagerin the YARN-3640 . Does it help? Yes, during shutdown there is another exception i.e ConnectionException thrown from NodeStatusUpdatorImpl thread. But I did not get much Idea how this exception effecting NodeManger services stop.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          I am able to reproduce it always, testing of any patch I will do and verify it in real cluster.

          Show
          rohithsharma Rohith Sharma K S added a comment - I am able to reproduce it always, testing of any patch I will do and verify it in real cluster.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          I think we can invoke System.exit once the NodeManger is shutdown in finally block. For test case execution, bypass using flag. Any thoughts?

          Show
          rohithsharma Rohith Sharma K S added a comment - I think we can invoke System.exit once the NodeManger is shutdown in finally block. For test case execution, bypass using flag. Any thoughts?
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Peng Zhang / Rohith Sharma K S, do you think YARN-3641 fixed this? Or we need more patches? If so, one of you willing to put up a patch?

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Peng Zhang / Rohith Sharma K S , do you think YARN-3641 fixed this? Or we need more patches? If so, one of you willing to put up a patch?
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          I will test YARN-3641 fix for this JIRA scenario. About the patch, I think calling System.exit() explicitely after shutdown thead exit is one option.

          Show
          rohithsharma Rohith Sharma K S added a comment - I will test YARN-3641 fix for this JIRA scenario. About the patch, I think calling System.exit() explicitely after shutdown thead exit is one option.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          I tested locally using YARN-3641 FIX, issue is still exist.

          Show
          rohithsharma Rohith Sharma K S added a comment - I tested locally using YARN-3641 FIX, issue is still exist.
          Hide
          jlowe Jason Lowe added a comment -

          Do you have the shutdown logs from the NM that hung? It seems very likely that somehow we did not close the leveldb state store cleanly, if you're seeing a leveldb non-daemon thread holding up the JVM shutdown.

          Show
          jlowe Jason Lowe added a comment - Do you have the shutdown logs from the NM that hung? It seems very likely that somehow we did not close the leveldb state store cleanly, if you're seeing a leveldb non-daemon thread holding up the JVM shutdown.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          I have attached NM logs and thread dump in YARN-3640. Would get it from YARN-3640?

          Show
          rohithsharma Rohith Sharma K S added a comment - I have attached NM logs and thread dump in YARN-3640 . Would get it from YARN-3640 ?
          Hide
          jlowe Jason Lowe added a comment -

          Ah, my apologies. I didn't realize it is failing with the exact same logs, even after YARN-3641. Could you to instrument logs in the state store code to verify the leveldb database is indeed being closed even when it hangs? Trying to determine if this is a bug in Hadoop code or a bug in the leveldb code.

          Show
          jlowe Jason Lowe added a comment - Ah, my apologies. I didn't realize it is failing with the exact same logs, even after YARN-3641 . Could you to instrument logs in the state store code to verify the leveldb database is indeed being closed even when it hangs? Trying to determine if this is a bug in Hadoop code or a bug in the leveldb code.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Could you to instrument logs in the state store code to verify the leveldb database is indeed being closed even when it hangs?

          sorry, did not get it exactly what and where should I add logs? Do you mean should I add log after NMLeveldbStateStoreService#closeStorage() being called?

          Show
          rohithsharma Rohith Sharma K S added a comment - Could you to instrument logs in the state store code to verify the leveldb database is indeed being closed even when it hangs? sorry, did not get it exactly what and where should I add logs? Do you mean should I add log after NMLeveldbStateStoreService#closeStorage() being called?
          Hide
          jlowe Jason Lowe added a comment -

          Yes, the idea is to show whether we successfully closed the database or not when the problem occurs. Sorry I wasn't clear on that.

          Show
          jlowe Jason Lowe added a comment - Yes, the idea is to show whether we successfully closed the database or not when the problem occurs. Sorry I wasn't clear on that.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Tested with patch to log before and after db.close, but found that db is closed.There were no exception thrown while closing db.close.

          Show
          rohithsharma Rohith Sharma K S added a comment - Tested with patch to log before and after db.close, but found that db is closed.There were no exception thrown while closing db.close.
          Hide
          jlowe Jason Lowe added a comment -

          If the database was successfully closed yet the non-daemon leveldb daemon thread remains in the jstack then it sounds like a bug in the leveldb code. As mentioned before, we can do an explicit exit when we think everything is shutdown to mitigate these kinds of problems.

          Show
          jlowe Jason Lowe added a comment - If the database was successfully closed yet the non-daemon leveldb daemon thread remains in the jstack then it sounds like a bug in the leveldb code. As mentioned before, we can do an explicit exit when we think everything is shutdown to mitigate these kinds of problems.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Another observation is I enabled debug logs for NodeManger. And noticed that occurrence of this issue become relative low. I think it a timing of db close causing issue in LevelDb. And this Issue won't appear always on all the nodes, but in cluster at lease one node in the cluster is going for toss.

          I too think it should a level db issue. I think we should report issue in LevelDb.

          For calling adding system.exit in NodeManager gracefully shutdown will mask many issues. Given this is acceptable , I will upload a patch.

          Show
          rohithsharma Rohith Sharma K S added a comment - Another observation is I enabled debug logs for NodeManger. And noticed that occurrence of this issue become relative low. I think it a timing of db close causing issue in LevelDb. And this Issue won't appear always on all the nodes, but in cluster at lease one node in the cluster is going for toss. I too think it should a level db issue. I think we should report issue in LevelDb. For calling adding system.exit in NodeManager gracefully shutdown will mask many issues. Given this is acceptable , I will upload a patch.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 16m 36s Findbugs (version ) appears to be broken on trunk.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 9m 13s There were no new javac warning messages.
          +1 javadoc 11m 9s There were no new javadoc warning messages.
          +1 release audit 0m 28s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 29s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 2m 9s mvn install still works.
          +1 eclipse:eclipse 1m 2s The patch built with eclipse:eclipse.
          -1 findbugs 1m 35s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings.
          -1 yarn tests 6m 40s Tests failed in hadoop-yarn-server-nodemanager.
              49m 27s  



          Reason Tests
          FindBugs module:hadoop-yarn-server-nodemanager
          Failed unit tests hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12735922/YARN-3585.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 5df1fad
          Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/8115/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
          hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8115/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8115/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8115/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 16m 36s Findbugs (version ) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 9m 13s There were no new javac warning messages. +1 javadoc 11m 9s There were no new javadoc warning messages. +1 release audit 0m 28s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 29s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 2m 9s mvn install still works. +1 eclipse:eclipse 1m 2s The patch built with eclipse:eclipse. -1 findbugs 1m 35s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. -1 yarn tests 6m 40s Tests failed in hadoop-yarn-server-nodemanager.     49m 27s   Reason Tests FindBugs module:hadoop-yarn-server-nodemanager Failed unit tests hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12735922/YARN-3585.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 5df1fad Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/8115/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8115/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8115/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8115/console This message was automatically generated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Recently in our testbed , nodemanager jvm did not exit during shutdown event .
          Exception is different, stack trace remains same as above

          Attaching the exception trace

          2015-05-30 02:11:49,122 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Unable to update state store diagnostics for container_e310_1432817693365_3338_01_000002
          java.io.IOException: org.iq80.leveldb.DBException: Closed
                  at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261)
                  at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109)
                  at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101)
                  at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
                  at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                  at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
                  at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
                  at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129)
                  at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83)
                  at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246)
                  at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
                  at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                  at java.lang.Thread.run(Thread.java:745)
          Caused by: org.iq80.leveldb.DBException: Closed
                  at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123)
                  at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106)
                  at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259)
                  ... 15 more
          
          
          Show
          bibinchundatt Bibin A Chundatt added a comment - Recently in our testbed , nodemanager jvm did not exit during shutdown event . Exception is different, stack trace remains same as above Attaching the exception trace 2015-05-30 02:11:49,122 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Unable to update state store diagnostics for container_e310_1432817693365_3338_01_000002 java.io.IOException: org.iq80.leveldb.DBException: Closed at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang. Thread .run( Thread .java:745) Caused by: org.iq80.leveldb.DBException: Closed at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123) at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259) ... 15 more
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          This is race condition when the NodeManager is shutting down and container is launched. By the time container is launched and returned to ContainerImpl, NodeManager closed the DB connection which resulting in {{org.iq80.leveldb.DBException: Closed
          }}

          Show
          rohithsharma Rohith Sharma K S added a comment - This is race condition when the NodeManager is shutting down and container is launched. By the time container is launched and returned to ContainerImpl, NodeManager closed the DB connection which resulting in {{org.iq80.leveldb.DBException: Closed }}
          Hide
          sunilg Sunil G added a comment -

          Hi Bibin A Chundatt and Rohith Sharma K S
          This recent exception trace is different from the focus of this Jira, and the root cause is given by Rohith. I feel you can separate this to another ticket.

          For DB Close vs Container Launch, we can add a check whether DB is closed while we move container from ACQUIRED state.

          Show
          sunilg Sunil G added a comment - Hi Bibin A Chundatt and Rohith Sharma K S This recent exception trace is different from the focus of this Jira, and the root cause is given by Rohith. I feel you can separate this to another ticket. For DB Close vs Container Launch, we can add a check whether DB is closed while we move container from ACQUIRED state.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Yes, we can raise different Jira. Bibin A Chundatt Can you raise Jira, we can validate the issue there?

          Show
          rohithsharma Rohith Sharma K S added a comment - Yes, we can raise different Jira. Bibin A Chundatt Can you raise Jira, we can validate the issue there?
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          -1 for findbug, does not show any error report, but not sure why -1 given.
          Test failure is unrelated to this patch.

          Jason Lowe Kindly review the patch.

          Show
          rohithsharma Rohith Sharma K S added a comment - -1 for findbug, does not show any error report, but not sure why -1 given. Test failure is unrelated to this patch. Jason Lowe Kindly review the patch.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the patch, Rohith!

          I think it would be safer/simpler to assume we shouldn't be calling Exit unless NodeManager.main() was invoked (i.e.: we're likely running in a JVM whose sole purpose is to be the nodemanager). In that sense I'm wondering if we should flip the logic to not exit but then have NodeManager.main override that. This probably precludes the need to update existing tests.

          We should be using ExitUtil instead of System.exit directly.

          Nit: "setexitOnShutdownEvent" s/b "setExitOnShutdownEvent"

          Show
          jlowe Jason Lowe added a comment - Thanks for the patch, Rohith! I think it would be safer/simpler to assume we shouldn't be calling Exit unless NodeManager.main() was invoked (i.e.: we're likely running in a JVM whose sole purpose is to be the nodemanager). In that sense I'm wondering if we should flip the logic to not exit but then have NodeManager.main override that. This probably precludes the need to update existing tests. We should be using ExitUtil instead of System.exit directly. Nit: "setexitOnShutdownEvent" s/b "setExitOnShutdownEvent"
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Rohith Sharma K S and Sunil G Have added jira YARN-3754 for tracking DB connection close

          Show
          bibinchundatt Bibin A Chundatt added a comment - Rohith Sharma K S and Sunil G Have added jira YARN-3754 for tracking DB connection close
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Thanks Jason Lowe for the review ..

          if we should flip the logic to not exit but then have NodeManager.main override that. This probably precludes the need to update existing tests.

          Make sense to me.. Changed the logic to call jvm exit when NodeMananager is instantiated from main function.

          We should be using ExitUtil instead of System.exit directly.

          Done

          Nit: "setexitOnShutdownEvent" s/b "setExitOnShutdownEvent"

          This method is not necessary now since patch preassume true when it is called from only main funtion. I have removed this.

          Kindly reveiw updated patch

          Show
          rohithsharma Rohith Sharma K S added a comment - Thanks Jason Lowe for the review .. if we should flip the logic to not exit but then have NodeManager.main override that. This probably precludes the need to update existing tests. Make sense to me.. Changed the logic to call jvm exit when NodeMananager is instantiated from main function. We should be using ExitUtil instead of System.exit directly. Done Nit: "setexitOnShutdownEvent" s/b "setExitOnShutdownEvent" This method is not necessary now since patch preassume true when it is called from only main funtion. I have removed this. Kindly reveiw updated patch
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 30s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 51s There were no new javac warning messages.
          +1 javadoc 9m 56s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 40s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 37s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 1m 14s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 6m 14s Tests passed in hadoop-yarn-server-nodemanager.
              45m 3s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12736738/0001-YARN-3585.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 990078b
          hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8159/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8159/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8159/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 30s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 51s There were no new javac warning messages. +1 javadoc 9m 56s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 40s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 37s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 14s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 6m 14s Tests passed in hadoop-yarn-server-nodemanager.     45m 3s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12736738/0001-YARN-3585.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 990078b hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8159/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8159/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8159/console This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          +1 latest patch lgtm. Will commit this tomorrow if there are no objections.

          Show
          jlowe Jason Lowe added a comment - +1 latest patch lgtm. Will commit this tomorrow if there are no objections.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks, Rohith! I committed this to trunk, branch-2, and branch-2.7.

          Show
          jlowe Jason Lowe added a comment - Thanks, Rohith! I committed this to trunk, branch-2, and branch-2.7.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7953 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7953/)
          YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7953 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7953/ ) YARN-3585 . NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/)
          YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/ ) YARN-3585 . NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/948/)
          YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/948/ ) YARN-3585 . NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/)
          YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/ ) YARN-3585 . NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/)
          YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/ ) YARN-3585 . NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/)
          YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/ ) YARN-3585 . NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/)
          YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/ ) YARN-3585 . NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.

            People

            • Assignee:
              rohithsharma Rohith Sharma K S
              Reporter:
              peng.zhang Peng Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development