[MAPREDUCE-4770] Hadoop jobs failing with FileNotFound Exception while the job is still running - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.20.203.0
Fix Version/s: None
Component/s: tasktracker
Labels:
None

Description

We are having a strange issue in our Hadoop cluster. We have noticed that some of our jobs fail with the with a file not found exception[see below]. Basically the files in the "attempt_*" directory and the directory itself are getting deleted while the task is still being run on the host. Looking through some of the hadoop documentation I see that the job directory gets wiped out when it gets a KillJobAction however I am not sure why it gets wiped out while the job is still running.

My question is what could be deleting it while the job is running? Any thoughts or pointers on how to debug this would be helpful.

Thanks!

java.io.FileNotFoundException: /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out (Permission denied) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:120) at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107) at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400) at org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Jaikannan Ramamoorthy

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 05/Nov/12 02:19

Updated:: 25/Jan/13 05:25