Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6428

Job History can be lost if there is any issue in writing jhist file while AM shuts down job

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.1
    • None
    • None
    • None

    Description

      We found that while job succeeds (can be seen as successful in RM), but as writing of jhist file fails, job cant be seen in JobHistory.

      2015-07-04 14:54:37,852 INFO [Thread-94] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STOPPED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: All datanodes 9.96.1.171:25009 are bad. Aborting...
      org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: All datanodes 9.96.1.171:25009 are bad. Aborting...
      	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:546)
      	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:340)
      	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
      	at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
      	at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
      	at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
      	at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1609)
      	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1107)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:603)
      Caused by: java.io.IOException: All datanodes 9.96.1.171:25009 are bad. Aborting...
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1145)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:926)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
      2015-07-04 14:54:37,853 WARN [Thread-94] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed. Exiting.. 
      org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: All datanodes 9.96.1.171:25009 are bad. Aborting...
      	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:546)
      	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:340)
      	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
      	at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
      	at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
      	at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
      	at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1609)
      	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1107)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:603)
      Caused by: java.io.IOException: All datanodes 9.96.1.171:25009 are bad. Aborting...
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1145)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:926)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
      2015-07-04 14:54:37,854 INFO [Thread-94] org.apache.hadoop.util.ExitUtil: Exiting with status 1
      

      We can probably mark the job as failure and inform RM if this happens. Thoughts ?

      Attachments

        Activity

          People

            varun_saxena Varun Saxena
            varun_saxena Varun Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: