Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Run a job , restart RM when job just finished. It should not restart the job once it Succeed.

      After RM restart, The AM of restarted job fails with below error.

      AM log after Rmrestart:

      013-08-19 17:29:21,144 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping JobHistoryEventHandler. Size of the outstanding queue size is 0
      2013-08-19 17:29:21,145 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
      2013-08-19 17:29:21,146 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://host1:port1/user/ABC/.staging/job_1376933101704_0001
      2013-08-19 17:29:21,156 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
      org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo
      at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1469)
      at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1324)
      at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1291)
      at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
      at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
      at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:922)
      at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:131)
      at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1184)
      at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:995)
      at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
      at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
      at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
      at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1323)
      Caused by: java.io.FileNotFoundException: File does not exist: hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo
      at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1121)
      at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1113)
      at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78)
      at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1113)
      at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
      at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1464)
      ... 17 more
      2013-08-19 17:29:21,158 INFO [Thread-2] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. Signaling RMCommunicator and JobHistoryEventHandler.
      2013-08-19 17:29:21,159 WARN [Thread-2] org.apache.hadoop.util.ShutdownHookManager: ShutdownHook 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException
      java.lang.NullPointerException
      at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:805)
      at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1344)
      at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

        Attachments

        1. MR5471-2AM.log
          21 kB
          Yesha Vora
        2. MR5471-1AM.log
          39 kB
          Yesha Vora

          Issue Links

            Activity

              People

              • Assignee:
                jianhe Jian He
                Reporter:
                yeshavora Yesha Vora
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: