Issue Details (XML | Word | Printable)

Key: HADOOP-771
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Mahadev konar
Reporter: Milind Bhandarkar
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Namenode should return error when trying to delete non-empty directory

Created: 30/Nov/06 11:03 PM   Updated: 08/Jul/09 04:42 PM
Return to search
Component/s: None
Affects Version/s: 0.8.0
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works Hadoop-771_2.patch 2008-02-26 10:54 PM Mahadev konar 14 kB
Text File Licensed for inclusion in ASF works Hadoop-771_3.patch 2008-03-04 01:36 AM Mahadev konar 14 kB
Text File Licensed for inclusion in ASF works Hadoop-771_4.patch 2008-03-05 07:43 PM Mahadev konar 15 kB
Text File Licensed for inclusion in ASF works Hadoop-771_5.patch 2008-03-07 02:54 AM Mahadev konar 93 kB
Text File Licensed for inclusion in ASF works Hadoop-771_6.patch 2008-03-07 08:47 PM Mahadev konar 95 kB
Text File Licensed for inclusion in ASF works Hadoop-771_7.patch 2008-03-09 06:08 PM Mahadev konar 95 kB
Text File Licensed for inclusion in ASF works Hadoop-771_8.patch 2008-03-09 06:26 PM Mahadev konar 95 kB
Text File Licensed for inclusion in ASF works patch_771_1.patch 2008-02-06 08:10 PM Mahadev konar 12 kB
Environment: all
Issue Links:
Reference
 

Hadoop Flags: Incompatible change
Release Note:
This patch adds a new api to file system i.e delete(path, boolean), deprecating the previous delete(path).
the new api recursively deletes files only if boolean is set to true.
If path is a file, the boolean value does not matter, if path is a directory and the directory is non empty delete(path, false) will throw an exception and delete(path, true) will delete all files recursively.
Resolution Date: 10/Mar/08 06:58 PM


 Description  « Hide
Currently, the namenode.delete() method allows recursive deletion of a directory. That is, even a non-empty directory could be deleted using namenode.delete(). To avoid costly programmer errors, the namenode should not remove the non-empty directories in this method. Recursively deleting directory should either be performed with listPaths() followed by a delete() for every path, or with a specific namenode method such as deleteRecursive().

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Milind Bhandarkar added a comment - 18/May/07 02:55 AM
Can this be fixed ?
A programmer error can cost a lot here.
No other file system allows this, IMHO for a reason.
Having been burnt once too many times, I would like to disallow recursive delete completely.

Doug Cutting added a comment - 18/May/07 05:15 PM
If we get rid of recursive deletion then we'll still have a method somewhere that implements recursive deletion and folks will still call it and things will still get unintentionally deleted. I don't think that's the solution here.

I think the problem is that 'FileSystem#remove(new Path(""))' removes one's home directory. To address this we could simply prohibit the empty string as a Path. One should be forced to use at least "." to refer to the connected/default directory. Would that work?


Jim Kellerman added a comment - 18/May/07 05:31 PM
Doug Cutting wrote:
> I think the problem is that 'FileSystem#remove(new Path(""))' removes one's home directory. To address
> this we could simply prohibit the empty string as a Path. One should be forced to use at least "." to refer
> to the connected/default directory. Would that work?

+1


Milind Bhandarkar added a comment - 18/May/07 06:05 PM
>> think the problem is that 'FileSystem#remove(new Path(""))' removes one's home directory.

I think that's just one part of the problem.

A single RPC to delete an entire directory structure is the main problem, I think. If deleting a directory were to take multiple roundtrips between client and namenode, then it would be possible to ^C the client to stop wiping out the entire tree. No ?


Doug Cutting added a comment - 18/May/07 06:19 PM
There were a number of things that could have prevented this. First, I'm told this was triggered by a buggy script parser that parsed 'rm<space><space>foo' as 'rm(""); foo();'. Fixing that would avoid this. Second, the script execution engine should have used Hadoop's Trash feature, so that deletions go through an intermediate directory. Third, rm("") shouldn't delete one's home directory but instead throw an exception. And, finally, fourth, if we distinguished between 'rm' and 'rmr', this might not have happened, but there's a good chance that the script would have instead contained 'rmr<space><space>foo', and it still would have happened. So I think distinguishing between recursive and non-recursive file deletion is the weakest defense against this. It might not be a bad thing. Hadoop's own FsShell already makes this distinction. Perhaps Pig's scripting language should too, and perhaps Hadoop's APIs should even make this distinction. But I'd give the first three fixes much higher priority.

Milind Bhandarkar added a comment - 18/May/07 06:52 PM
I agree. The first three fixes should be given a higher priority. But eventually allowing recursive deletion with a single RPC should be prevented. It will also allow us to move the INode inner class outside of FSDirectory, and save on precious memory.

Hairong Kuang added a comment - 18/May/07 06:54 PM
Currently FileSystem APIs only support "rmr". To avoid programmatic error, should we change the FileSystem interface so that it distinguishes between rm and rmr? Instead of having "void delete(Path)", we could have "void delete(Path, boolean)". The boolean flag indicates if the deletion is recursive or not.

Doug Cutting added a comment - 18/May/07 07:19 PM
> A single RPC to delete an entire directory structure is the main problem

On the other hand, won't an RPC per file cause a lot of namenode traffic?


Milind Bhandarkar added a comment - 18/May/07 07:26 PM
>> On the other hand, won't an RPC per file cause a lot of namenode traffic?

yes, it would. But only when user intentionally deletes all his/her data, which I believe happens rarely.

I think, DFS is optimized for a small number of large files, rather than large number of small files, so in practice, this erasure of entire tree should not happen frequently.


Sameer Paranjpye added a comment - 18/May/07 07:37 PM
> If we get rid of recursive deletion then we'll still have a method somewhere that implements recursive deletion and folks will still call it and things will still get unintentionally
> deleted. I don't think that's the solution here.

I agree, even if we prohibit recursive deletion at the Namenode it'll get implemented in a library and called nonetheless. One might be able to abort a deletion in flight, but I don't believe that a user could react quickly enough to a mistaken deletion. The increase in the number of Namenode RPCs is an additional problem.

Hairong's suggestion for a boolean flag specifying recursion would be a good way to go, it would require programmers to be explicit about recursion and avoid additional RPC traffic.


Mahadev konar added a comment - 01/Feb/08 07:03 PM
so is implementing a deletion with recursive flag acceptable to all? we still keep the api with just delete for backwards compatibility.

Mahadev konar added a comment - 06/Feb/08 08:10 PM
this patch create a new filesystem api delte(path, boolean) which would throw an exception in case path is a directory, is non emppty and boolean is false. If boolean is set to true it behaves the same as previous delete. I have also deprecated delete(path). applications would use delete(path, boolean) instead. I have made changes to all the other file systems to implement them correctly.

Mahadev konar added a comment - 06/Feb/08 09:59 PM
i meant delete(path, boolean) (not delte(path, boolean)).

Mahadev konar added a comment - 26/Feb/08 10:54 PM
here is a mroe recent patch (applies to trunk) which fixes recursive deletes for s3 and cleans up some debug statements.

dhruba borthakur added a comment - 27/Feb/08 01:51 AM
We can deprecate all the 'delete(name)' APIs.

The description of the new public AP can describe better what happens when a recursive flag is set and the pathname is a file.


Mahadev konar added a comment - 04/Mar/08 01:36 AM
this patch fixes dhruba's comments.

dhruba borthakur added a comment - 04/Mar/08 09:12 AM
I think we should deprecate NameNode.delete(path) and DFSClient.delete(Path) as well. The callers have to use the new API that explicitly sets the "recursive" flag. We should remove FSNamesystem.delete(Path) altogether. Otherwise code looks good. +1.

Mahadev konar added a comment - 05/Mar/08 07:43 PM
uploading a new patch with dhruba's comments incorporated.

Mahadev konar added a comment - 06/Mar/08 08:51 PM
retrying hudson.

Hadoop QA added a comment - 06/Mar/08 10:01 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377190/Hadoop-771_4.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 3 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 808 javac compiler warnings (more than the trunk's current 590 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1906/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1906/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1906/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1906/console

This message is automatically generated.


Mahadev konar added a comment - 06/Mar/08 11:40 PM
forgot to remove delete(path) uses!

Mahadev konar added a comment - 07/Mar/08 02:54 AM
updated patch getting rid of the javac warnings – replacing all the uses of delete(path) with delete(path, true).

Mahadev konar added a comment - 07/Mar/08 07:27 PM
trying hudson!!

Hadoop QA added a comment - 07/Mar/08 08:33 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377313/Hadoop-771_5.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 210 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 605 javac compiler warnings (more than the trunk's current 592 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1912/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1912/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1912/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1912/console

This message is automatically generated.


Mahadev konar added a comment - 07/Mar/08 08:47 PM
attaching a new patch removing some more javac warnings.

Mahadev konar added a comment - 09/Mar/08 05:50 PM
trying hudson again....

Hadoop QA added a comment - 09/Mar/08 05:58 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377386/Hadoop-771_6.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 216 new or modified tests.

patch -1. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1921/console

This message is automatically generated.


Mahadev konar added a comment - 09/Mar/08 06:08 PM
uploading the patch that applies to trunk.

Hadoop QA added a comment - 09/Mar/08 06:21 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377487/Hadoop-771_7.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 216 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs -1. The patch appears to cause Findbugs to fail.

core tests -1. The patch failed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1922/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1922/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1922/console

This message is automatically generated.


Mahadev konar added a comment - 09/Mar/08 06:26 PM
the last patch had a conflict... resolving and uploading a new one.

Hadoop QA added a comment - 09/Mar/08 07:39 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377489/Hadoop-771_8.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 216 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 598 javac compiler warnings (more than the trunk's current 589 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1923/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1923/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1923/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1923/console

This message is automatically generated.


Mahadev konar added a comment - 09/Mar/08 08:02 PM
the javac patch warnings are just due to declaration of deprecated api and not the use.

dhruba borthakur added a comment - 10/Mar/08 06:58 PM
I just committed this. Thanks Mahadev.

Hudson added a comment - 12/Mar/08 12:18 PM

Tsz Wo (Nicholas), SZE added a comment - 07/Apr/08 10:29 PM
Should we deprecate FileUtil.fullyDelete(FileSystem fs, Path dir)? Is it the same as FileSystem.delete(path, recursive=true)?

Doug Cutting added a comment - 07/Apr/08 10:36 PM
> Should we deprecate FileUtil.fullyDelete(FileSystem fs, Path dir)?

+1. Not only is it redundant, but it still uses the pre-URI style of passing both a FileSystem and a Path, making it doubly-redundant too, in addition, as well and so on.


Tsz Wo (Nicholas), SZE added a comment - 07/Apr/08 10:47 PM
Created HADOOP-3202 for deprecating FileUtil.fullyDelete(FileSystem fs, Path dir).

Robert Chansler added a comment - 14/Apr/08 04:30 PM
Noted as incompatible in changes.txt