Issue Details (XML | Word | Printable)

Key: HADOOP-5248
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Devaraj Das
Reporter: Hemanth Yamijala
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Job directories could remain undeleted in some scenarios after job completes.

Created: 13/Feb/09 08:33 AM   Updated: 08/Jul/09 04:53 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 5248-testcase.patch 2009-02-19 05:02 PM Devaraj Das 4 kB
Text File Licensed for inclusion in ASF works 5248.patch 2009-02-19 05:33 AM Devaraj Das 0.9 kB
Issue Links:
Blocker
 

Resolution Date: 26/Feb/09 11:25 AM


 Description  « Hide
I observed a couple of times that when a job has completed, its job directories were not cleaned up. In discussion, it seems like there is a condition when only reduces from a job are run on a machine and no maps, the TT does not get a signal from the JT to delete the files and could be left behind. FYI, JVM reuse was enabled at the time. I can confirm that 'KillJobAction' was not received by the TT.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Hemanth Yamijala added a comment - 18/Feb/09 08:53 AM
After discussion with Sameer and Devaraj, marking this a blocker for Hadoop 0.20.

Devaraj Das added a comment - 19/Feb/09 05:33 AM
Attaching a patch. Testing in progress.

Devaraj Das added a comment - 19/Feb/09 05:02 PM
The earlier patch I submitted can be ignored. HADOOP-5247 addresses the problem better by broadcasting a KillJobAction to all trackers upon job termination. The attached patch is a testcase that currently fails but passes with the patch for HADOOP-5247.

Devaraj Das added a comment - 26/Feb/09 11:25 AM
I just committed the testcase.

Hudson added a comment - 26/Feb/09 03:12 PM
Integrated in Hadoop-trunk #766 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/766/)
. A testcase that checks for the existence of job directory after the job completes. Fails if it exists. Contributed by Devaraj Das.