Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4135

MRAppMaster throws IllegalStateException while shutting down

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0.0-alpha, 3.0.0
    • Fix Version/s: 0.23.3
    • Component/s: mrv2
    • Labels:
      None

      Description

      Always MRAppMaster throws IllegalStateException in the stderr while shutting down. It doesn't look good having this exception in the stderr file of MRAppMaster container.

      WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
      Exception in thread "Thread-1" java.lang.IllegalStateException: Shutdown in progress
      	at java.lang.ApplicationShutdownHooks.remove(ApplicationShutdownHooks.java:55)
      	at java.lang.Runtime.removeShutdownHook(Runtime.java:220)
      	at org.apache.hadoop.fs.FileSystem$Cache.remove(FileSystem.java:2148)
      	at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2180)
      	at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2157)
      	at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:361)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1014)
      
      1. MAPREDUCE-4135.patch
        0.9 kB
        Devaraj K
      2. MAPREDUCE-4135.patch
        0.9 kB
        Devaraj K
      3. MAPREDUCE-4135.branch-0.23.patch
        2 kB
        Ravi Prakash

        Issue Links

          Activity

          Hide
          Steve Loughran added a comment -

          The problem is that the FS shutdown process is being triggered by the app master shutdown hook -at which point it is too late to remove other shutdown hooks.

          FileSystem.Cache.remove() should just catch any IllegalStateException and not worry so much about shutdown hook state failures as the system is usually in an odd state at this point.

          Show
          Steve Loughran added a comment - The problem is that the FS shutdown process is being triggered by the app master shutdown hook -at which point it is too late to remove other shutdown hooks. FileSystem.Cache.remove() should just catch any IllegalStateException and not worry so much about shutdown hook state failures as the system is usually in an odd state at this point.
          Hide
          Devaraj K added a comment -

          As Alejandro mentioned in MAPREDUCE-4137, Calling FileSystem.closeAll() from MRAppMaster is not required because the FileSystem has its own shutdown hook ClientFinalizer which does closeAll().

          I have removed it and attached the patch. I have not added any tests because it is just code removal and verified it manually.

          Show
          Devaraj K added a comment - As Alejandro mentioned in MAPREDUCE-4137 , Calling FileSystem.closeAll() from MRAppMaster is not required because the FileSystem has its own shutdown hook ClientFinalizer which does closeAll(). I have removed it and attached the patch. I have not added any tests because it is just code removal and verified it manually.
          Hide
          Alejandro Abdelnur added a comment -

          +1

          Show
          Alejandro Abdelnur added a comment - +1
          Hide
          Devaraj K added a comment -

          Re-attaching the same patch to run Hadoop QA.

          Show
          Devaraj K added a comment - Re-attaching the same patch to run Hadoop QA.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522393/MAPREDUCE-4135.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
          org.apache.hadoop.yarn.server.TestDiskFailures
          org.apache.hadoop.yarn.server.TestContainerManagerSecurity
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
          org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry
          org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
          org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
          org.apache.hadoop.mapred.TestMiniMRClasspath
          org.apache.hadoop.mapreduce.v2.TestMRJobs
          org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers
          org.apache.hadoop.mapred.TestMiniMRBringup
          org.apache.hadoop.mapred.TestMiniMRChildTask
          org.apache.hadoop.mapred.TestReduceFetch
          org.apache.hadoop.mapred.TestClusterMRNotification
          org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
          org.apache.hadoop.mapred.TestJobCounters
          org.apache.hadoop.mapreduce.TestChild
          org.apache.hadoop.mapred.TestMiniMRClientCluster
          org.apache.hadoop.ipc.TestSocketFactory
          org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
          org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
          org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
          org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
          org.apache.hadoop.mapred.TestClientRedirect
          org.apache.hadoop.mapred.TestLazyOutput
          org.apache.hadoop.mapred.TestJobCleanup
          org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
          org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
          org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner
          org.apache.hadoop.conf.TestNoDefaultsJobConf
          org.apache.hadoop.mapreduce.v2.TestRMNMInfo
          org.apache.hadoop.mapred.TestClusterMapReduceTestCase
          org.apache.hadoop.mapreduce.v2.TestNonExistentJob
          org.apache.hadoop.mapred.TestJobSysDirWithDFS
          org.apache.hadoop.mapreduce.v2.TestUberAM
          org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
          org.apache.hadoop.mapred.TestJobName
          org.apache.hadoop.mapreduce.security.TestJHSSecurity

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2209//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2209//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522393/MAPREDUCE-4135.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.yarn.server.TestDiskFailures org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.mapred.TestMiniMRClasspath org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers org.apache.hadoop.mapred.TestMiniMRBringup org.apache.hadoop.mapred.TestMiniMRChildTask org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapred.TestClusterMRNotification org.apache.hadoop.mapred.TestReduceFetchFromPartialMem org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.ipc.TestSocketFactory org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.mapred.TestClientRedirect org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapred.TestJobCleanup org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapred.TestJobSysDirWithDFS org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapred.TestJobName org.apache.hadoop.mapreduce.security.TestJHSSecurity +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2209//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2209//console This message is automatically generated.
          Hide
          Devaraj K added a comment -

          These above test failures are not related to the patch. Seems these are random failures with the reason "java.net.BindException: Problem binding to [0.0.0.0:8031] java.net.BindException: Address already in use". Same tests failed for MAPREDUCE-4140 QA also.

          Show
          Devaraj K added a comment - These above test failures are not related to the patch. Seems these are random failures with the reason "java.net.BindException: Problem binding to [0.0.0.0:8031] java.net.BindException: Address already in use". Same tests failed for MAPREDUCE-4140 QA also.
          Hide
          Ravi Prakash added a comment -

          I'd set automatic close to false to fix MAPREDUCE-3614.

          MRAppMaster.java:1000-1003

                // Do not automatically close FileSystem objects so that in case of
                // SIGTERM I have a chance to write out the job history. I'll be closing
                // the objects myself.
                conf.setBoolean("fs.automatic.close", false);
          

          We cannot let FileSystem close the handle before we write out JobHistory. We'll have to find a way to still gracefully release all resources.

          Show
          Ravi Prakash added a comment - I'd set automatic close to false to fix MAPREDUCE-3614 . MRAppMaster.java:1000-1003 // Do not automatically close FileSystem objects so that in case of // SIGTERM I have a chance to write out the job history. I'll be closing // the objects myself. conf.setBoolean("fs.automatic.close", false); We cannot let FileSystem close the handle before we write out JobHistory. We'll have to find a way to still gracefully release all resources.
          Hide
          Ravi Prakash added a comment -

          This is a particularly vexing problem. I can think of two ways out of this
          1. Swallow IllegalStateExceptions thrown by removeShutdownHook in FileSystem. The javadoc states:

          public boolean removeShutdownHook(Thread hook)
          Throws:
          IllegalStateException - If the virtual machine is already in the process of shutting down

          So if we are getting this exception, it MEANS we are already in the process of shutdown, so we CANNOT, try what we may, removeShutdownHook. If Runtime had a method Runtime.isShutdownInProgress(), we could have checked for it before the removeShutdownHook call. As it stands, there is no such method. In my opinion, this would be a good patch regardless of the needs for this JIRA.

          2. Not send SIGTERMs from the NM to the MR-AM in the first place. Rather we should expose a mechanism for the NM to politely tell the AM its no longer needed and should shutdown asap. Even after this, if an admin were to kill the MRAppMaster with a SIGTERM, the JobHistory would be lost defeating the purpose of 3614

          Show
          Ravi Prakash added a comment - This is a particularly vexing problem. I can think of two ways out of this 1. Swallow IllegalStateExceptions thrown by removeShutdownHook in FileSystem. The javadoc states: public boolean removeShutdownHook(Thread hook) Throws: IllegalStateException - If the virtual machine is already in the process of shutting down So if we are getting this exception, it MEANS we are already in the process of shutdown, so we CANNOT, try what we may, removeShutdownHook. If Runtime had a method Runtime.isShutdownInProgress(), we could have checked for it before the removeShutdownHook call. As it stands, there is no such method. In my opinion, this would be a good patch regardless of the needs for this JIRA. 2. Not send SIGTERMs from the NM to the MR-AM in the first place. Rather we should expose a mechanism for the NM to politely tell the AM its no longer needed and should shutdown asap. Even after this, if an admin were to kill the MRAppMaster with a SIGTERM, the JobHistory would be lost defeating the purpose of 3614
          Hide
          Ravi Prakash added a comment -

          Wait! I thought of another way.
          3. We can @Override stop() in MRAppMaster to call super.stop() and then do the FileSystem.closeAll(). Set a flag to denote that stop() has already been called. In the MRAppMasterShutdownHook, if the flag is set, NOT call the FileSystem.closeAll(). Sheesh! Now I feel stupid for not thinking about this earlier
          What do all y'all think of this way?

          Show
          Ravi Prakash added a comment - Wait! I thought of another way. 3. We can @Override stop() in MRAppMaster to call super.stop() and then do the FileSystem.closeAll(). Set a flag to denote that stop() has already been called. In the MRAppMasterShutdownHook, if the flag is set, NOT call the FileSystem.closeAll(). Sheesh! Now I feel stupid for not thinking about this earlier What do all y'all think of this way?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522790/MAPREDUCE-4135.branch-0.23.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
          org.apache.hadoop.yarn.server.TestDiskFailures
          org.apache.hadoop.yarn.server.TestContainerManagerSecurity
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
          org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry
          org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
          org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
          org.apache.hadoop.mapred.TestMiniMRClasspath
          org.apache.hadoop.mapreduce.v2.TestMRJobs
          org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers
          org.apache.hadoop.mapred.TestMiniMRBringup
          org.apache.hadoop.mapred.TestMiniMRChildTask
          org.apache.hadoop.mapred.TestReduceFetch
          org.apache.hadoop.mapred.TestClusterMRNotification
          org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
          org.apache.hadoop.mapred.TestJobCounters
          org.apache.hadoop.mapreduce.TestChild
          org.apache.hadoop.mapred.TestMiniMRClientCluster
          org.apache.hadoop.ipc.TestSocketFactory
          org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
          org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
          org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
          org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
          org.apache.hadoop.mapred.TestClientRedirect
          org.apache.hadoop.mapred.TestLazyOutput
          org.apache.hadoop.mapred.TestJobCleanup
          org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
          org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
          org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner
          org.apache.hadoop.conf.TestNoDefaultsJobConf
          org.apache.hadoop.mapreduce.v2.TestRMNMInfo
          org.apache.hadoop.mapred.TestClusterMapReduceTestCase
          org.apache.hadoop.mapreduce.v2.TestNonExistentJob
          org.apache.hadoop.mapred.TestJobSysDirWithDFS
          org.apache.hadoop.mapreduce.v2.TestUberAM
          org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
          org.apache.hadoop.mapred.TestJobName
          org.apache.hadoop.mapreduce.security.TestJHSSecurity

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2225//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2225//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522790/MAPREDUCE-4135.branch-0.23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.yarn.server.TestDiskFailures org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.mapred.TestMiniMRClasspath org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers org.apache.hadoop.mapred.TestMiniMRBringup org.apache.hadoop.mapred.TestMiniMRChildTask org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapred.TestClusterMRNotification org.apache.hadoop.mapred.TestReduceFetchFromPartialMem org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.ipc.TestSocketFactory org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.mapred.TestClientRedirect org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapred.TestJobCleanup org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapred.TestJobSysDirWithDFS org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapred.TestJobName org.apache.hadoop.mapreduce.security.TestJHSSecurity +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2225//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2225//console This message is automatically generated.
          Hide
          Ravi Prakash added a comment -

          Hi Devaraj! I've assigned this JIRA to myself. I hope you don't mind.

          AFTER applying the latest patch, inside MRAppMaster.stop(), super.stop() is throwing an InterruptedException

          2012-04-24 09:25:20,104 WARN [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while stopping
          java.lang.InterruptedException
                  at java.lang.Object.wait(Native Method)
                  at java.lang.Thread.join(Thread.java:1186)
                  at java.lang.Thread.join(Thread.java:1239)
                  at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:104)
                  at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
                  at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
                  at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1016)
                  at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:418)
                  at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:386)
                  at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
                  at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74)
                  at java.lang.Thread.run(Thread.java:662)
          

          This is causing the subsequent statement (FileSystem.closeAll() ) to be skipped. So to fix this JIRA, I'll see what I can do to fix the InterruptedException.

          Show
          Ravi Prakash added a comment - Hi Devaraj! I've assigned this JIRA to myself. I hope you don't mind. AFTER applying the latest patch, inside MRAppMaster.stop(), super.stop() is throwing an InterruptedException 2012-04-24 09:25:20,104 WARN [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while stopping java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1186) at java.lang.Thread.join(Thread.java:1239) at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:104) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1016) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:418) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:386) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74) at java.lang.Thread.run(Thread.java:662) This is causing the subsequent statement (FileSystem.closeAll() ) to be skipped. So to fix this JIRA, I'll see what I can do to fix the InterruptedException.
          Hide
          Ravi Prakash added a comment -

          I'm sorry I was wrong about my earlier statement:

          This is causing the subsequent statement (FileSystem.closeAll() ) to be skipped. So to fix this JIRA, I'll see what I can do to fix the InterruptedException.

          The InterruptedException is thrown and then caught. So its not that. Its MAPREDUCE-4157. The SIGTERM is sent without any delay on ContainerLaunch.java:337.

          Show
          Ravi Prakash added a comment - I'm sorry I was wrong about my earlier statement: This is causing the subsequent statement (FileSystem.closeAll() ) to be skipped. So to fix this JIRA, I'll see what I can do to fix the InterruptedException. The InterruptedException is thrown and then caught. So its not that. Its MAPREDUCE-4157 . The SIGTERM is sent without any delay on ContainerLaunch.java:337.
          Hide
          Ravi Prakash added a comment -

          We can put this patch in after MAPREDUCE-4157 and this problem should go away!

          Show
          Ravi Prakash added a comment - We can put this patch in after MAPREDUCE-4157 and this problem should go away!
          Hide
          Alejandro Abdelnur added a comment -

          First, I don't think FileSystem.closeAll() should try to remove its shutdown hook, the hook could be there and it would just be a NOP.

          Secondly, shutdown hooks are run in parallel/non-deterministic-order. if there is nothing that ensures that the FileSystem shutdown hook won't be run until after the MRAppMaster shutdown hook.

          IMO the correct solution here would be:

          • All close/destroy/cleanup operations are done only via Service's stop() method. This ensures an order.
          • The outermost service is the only one adding a shutdown hook and within it i call its stop() which will do a NOP if it has been already called.
          Show
          Alejandro Abdelnur added a comment - First, I don't think FileSystem.closeAll() should try to remove its shutdown hook, the hook could be there and it would just be a NOP. Secondly, shutdown hooks are run in parallel/non-deterministic-order. if there is nothing that ensures that the FileSystem shutdown hook won't be run until after the MRAppMaster shutdown hook. IMO the correct solution here would be: All close/destroy/cleanup operations are done only via Service's stop() method. This ensures an order. The outermost service is the only one adding a shutdown hook and within it i call its stop() which will do a NOP if it has been already called.
          Hide
          Alejandro Abdelnur added a comment -

          The shutdown hook in MRAppMaster should be retrofitted to use the ShutdownHookManager (HADOOP-8325)

          Show
          Alejandro Abdelnur added a comment - The shutdown hook in MRAppMaster should be retrofitted to use the ShutdownHookManager ( HADOOP-8325 )
          Hide
          Alejandro Abdelnur added a comment -

          HADOOP-8325 & MAPREDUCE-4135 take care of this problem in a more generic way. Please evaluate and close if so.

          Show
          Alejandro Abdelnur added a comment - HADOOP-8325 & MAPREDUCE-4135 take care of this problem in a more generic way. Please evaluate and close if so.
          Hide
          Ravi Prakash added a comment -

          Thanks Tucu! I hadn't seen HADOOP-8325. You probably meant MAPREDUCE-4157. I am planning on letting this JIRA stay open until either / both of those JIRA make the problem go away.

          Show
          Ravi Prakash added a comment - Thanks Tucu! I hadn't seen HADOOP-8325 . You probably meant MAPREDUCE-4157 . I am planning on letting this JIRA stay open until either / both of those JIRA make the problem go away.
          Hide
          Ravi Prakash added a comment -

          Thanks Tucu! HADOOP-8325 seems to have fixed this issue. Duping this jira

          Show
          Ravi Prakash added a comment - Thanks Tucu! HADOOP-8325 seems to have fixed this issue. Duping this jira

            People

            • Assignee:
              Ravi Prakash
              Reporter:
              Devaraj K
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development