Description
some job report error, like this:
hadoop.mapreduce.Job.monitorAndPrintJob(Job.java 1367) [main] : map 100% reduce 100% [2017-08-31T20:27:12.591+08:00] [INFO] hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277) [main] : Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server [2017-08-31T20:27:12.821+08:00] [INFO] hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277) [main] : Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server [2017-08-31T20:27:13.039+08:00] [INFO] hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277) [main] : Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server [2017-08-31T20:27:13.256+08:00] [ERROR] hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java 1034) [main] : Error Launching job : java.io.IOException: Unknown Job job_xxx_xxx
I found the am container log, like below. Here we know error happened in pipeline, maybe some dn error. And I also found some other reason which close the JobHistoryEventHandler. So MR AM can't write the information for JH. So client counldn't know whether the appplication is finished.
2017-08-31 20:27:10,813 INFO [Thread-1968] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing event MAP_ATTEMPT_STARTED 2017-08-31 20:27:10,814 ERROR [Thread-1968] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing History Event: org.apache.hadoop.mapreduce.jobhistory.TaskAttemptStartedEvent@2055ea0a java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2292) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1317) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) 2017-08-31 20:27:10,814 INFO [Thread-1968] org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler failed in state STOPPED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException: Premature EOF: no length prefix available org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:580) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:374) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
This problem is serious , especially for hive. Job must rerun meaninglessly! So I think we need to retry the operation of writing history event.