[MAPREDUCE-133] Getting errors in reading the output files of a map/reduce job immediately after the job is complete - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

I have an app that fire up map/reduce jobs sequentially. The output of one job if the input of the next.
I observe that many map tasks failed due to file read errors:

java.rmi.RemoteException: java.io.IOException: Cannot open filename /user/runping/runping/docs_store/stage_2/base_docs/part-00186 at org.apache.hadoop.dfs.NameNode.open(NameNode.java:130) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216) at org.apache.hadoop.ipc.Client.call(Client.java:303) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141) at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:315) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:302) at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:95) at org.apache.hadoop.dfs.DistributedFileSystem.openRaw(DistributedFileSystem.java:78) at org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:46) at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:220) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:146) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:234) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:226) at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:36) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:53) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:105) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709)

Those tasks succeeded in the second or third try.

After interting 10 seconds sleep between consecutive jobs, the problem disappear.

Here is my code to detect whether a job is completed:

try {
running = jc.submitJob(job);
String jobId = running.getJobID();
System.out.println("start job:\t" + jobId);
while (!running.isComplete()) {
try

{ Thread.sleep(1000); }

catch (InterruptedException e) {}
running = jc.getJob(jobId);
}
sucess = running.isSuccessful();
} finally {
if (!sucess && (running != null))

{ running.killJob(); }

jc.close();
}

Attachments

Activity

People

Assignee:: Owen O'Malley

Reporter:: Runping Qi

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 25/Apr/06 05:01

Updated:: 17/Jul/11 20:50

Resolved:: 17/Jul/11 20:50