Issue Details (XML | Word | Printable)

Key: HADOOP-3400
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: dhruba borthakur
Reporter: dhruba borthakur
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Facilitate creation of temporary files in HDFS

Created: 16/May/08 01:14 AM   Updated: 08/Jul/09 04:43 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works deleteOnExit.patch 2008-05-20 06:40 AM dhruba borthakur 5 kB
Text File Licensed for inclusion in ASF works deleteOnExit.patch 2008-05-18 07:55 AM dhruba borthakur 5 kB
Text File Licensed for inclusion in ASF works deleteOnExit.patch 2008-05-16 11:46 PM dhruba borthakur 4 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Resolution Date: 22/May/08 05:03 AM


 Description  « Hide
There are a set of applications that use HDFS to create temporary files. The application would ideally like these files to be automatically deleted when the application process exits. This is similar to the File.deleteOnExit() in the Java API.

One proposal is to add a new method in FileSystem

public void deleteOnExit(Path)

This API requests that the file or directory denoted by this abstract pathname be deleted when the virtual machine terminates. Deletion will be attempted only for normal termination of the virtual machine, as defined by the Java Language Specification. Once deletion has been requested, it is not possible to cancel the request. This method should therefore be used with care.

This method can be implemented entirely in the client side code, e.g. FileSystem.java will keep a cache of all the pathnames specified by the above API. FileSystem.close will invoke delete() on all the pathnames found in the cache.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
dhruba borthakur added a comment - 16/May/08 11:46 PM
This is a first version of the patch that still has some debugging messages. Early code review comments welcome. This introduces a new FileSystem API.

Tsz Wo (Nicholas), SZE added a comment - 17/May/08 12:41 AM
  • Would it better call it deleteOnClose?
  • FileSystem.delete(path) is deprecated. We should use FileSystem.delete(path, recursive). Should we let the user specify "recursive"?
  • If processDeleteOnExit() throws an IOException, then the FileSystem won't be closed properly and it will remain in the cache. I think processDeleteOnExit() should not throws exceptions.

dhruba borthakur added a comment - 18/May/08 07:55 AM
Incorporated Nicholas's comments. For now, I purposely kept the name deleteOnExit while making the javadoc explicitly state that the file will be deleted when either the JVM exits or the FileSystem is closed. I did this to maintain some sort of uniformity with the java File.deleteOnExit() API. But, if more people feel strongly about renaming this API as deleteOnClose, i will be happy do do that change.

I also kept the default behaviour to be recursive.

I enhanced the unit test to test for cases when deleteOnExit is called before/after the file is closed. Also tests the case when the filesystem is closed without closing the file. Also, test on LocalFileSystem as well as HDFS.


Hadoop QA added a comment - 18/May/08 09:28 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12382258/deleteOnExit.patch
against trunk revision 656939.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2497/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2497/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2497/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2497/console

This message is automatically generated.


Tsz Wo (Nicholas), SZE added a comment - 19/May/08 05:46 PM
+1 patch looks good. Two minor comments:
  • change "TreeSet<Path> deleteOnExit " to "Set<Path> deleteOnExit " or "SortedSet<Path> deleteOnExit"
  • remove "throws IOException" from "protected void processDeleteOnExit() throws IOException"

dhruba borthakur added a comment - 20/May/08 06:40 AM
Incorporated Nicholas' comments,

dhruba borthakur added a comment - 20/May/08 06:41 AM
Incorporated Nicholas' code review comments.

dhruba borthakur added a comment - 20/May/08 06:41 AM
Incorporated Nicholas' code review comments.

Steve Loughran added a comment - 20/May/08 04:28 PM
While useful, this code is built on the assumptions that GC/shutdown happens cleanly,. and that when it does, the filesystem is visible. A long-lived filesystem will still pick up lots of orphan temp files.

What may be (eventually) useful would be the ability to give files an expiry time in their metadata, and have something intermittently walk the FS, deleting files that have expired.


Doug Cutting added a comment - 20/May/08 04:54 PM
> What may be (eventually) useful would be the ability to give files an expiry time in their metadata, and have something intermittently walk the FS, deleting files that have expired.

The trash mechanism already supports that. If you move something to a directory named .Trash/0807040000 in your home directory then HDFS (if configured with a non-zero fs.trash.interval) will remove it after the 4th of July. The API doesn't currently provide explicit support for this, but it would be easy to add a removeAfter(Path, Date) method.


dhruba borthakur added a comment - 21/May/08 05:19 PM
Re-submitting patch to trigger Hudson QA tests.

dhruba borthakur added a comment - 21/May/08 05:19 PM
Re-submitting patch to trigger Hudson QA tests.

dhruba borthakur added a comment - 21/May/08 07:09 PM
Responding to Steve' comments: I agree that deleteOnExit helps only if the JVM shuts down cleanly and the filesystem is visible. A big percentage of our use cases actually fall in this category. I would like the deleteOnExit API to be part of the 0.18 release and following that, start discussion on the removeAfter() API proposed by Doug.

Hadoop QA added a comment - 21/May/08 10:54 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12382350/deleteOnExit.patch
against trunk revision 658862.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2510/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2510/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2510/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2510/console

This message is automatically generated.


dhruba borthakur added a comment - 22/May/08 05:03 AM
I just committed this.

Hudson added a comment - 22/May/08 12:24 PM