Description
It would be beneficial to make the Trash policy pluggable. One primary use-case for this is to archive files (in some remote store) when they get removed by Trash emptier.
Attachments
Attachments
- PluggableTrash_V2.patch
- 13 kB
- Usman Masood
- PluggableTrash_V3.patch
- 26 kB
- Usman Masood
- PluggableTrash_V4.patch
- 26 kB
- Usman Masood
- PluggableTrash_V5.patch
- 28 kB
- Usman Masood
- PluggableTrash_V6.patch
- 28 kB
- Usman Masood
- PluggableTrash.patch
- 28 kB
- Usman Masood
Issue Links
- blocks
-
HDFS-2150 Use new Trash Emptier methods
- Resolved
- is related to
-
HADOOP-8598 Server-side Trash
- Open
Activity
Patch to hadoop/common for this patch. For backwards compatibility checkpoint() and expunge() methods are included in the public Trash interface.
I think the patch looks generally good and this is a fine direction to pursue. But the Trash API is declared Stable, and removing the public constructor is an incompatible API change. Perhaps instead the default implementation should remain on the base class, but the factory should be used. That way, so long as folks are using the default implementation, existing code that calls 'new Trash()' will still work. That constructor can be deprecated, and the refactoring of the default implementation could be made later.
Can we have some kind of TrashFactory, which will create the instance of Trash policy implementaions.Bydefault it can create Trash.java instance. Trash.java class will not be changed and deprecate the constructor. If we configure some class for Trash policy(fs.trash.classname), then factory Should create that instance. We can define one Base interface for Trash policies and in Future let all Trash policy implemantations implement those APIs(looks you already identified the APIs) from Base trash interface.
In Future only TrashFactory will be used to get the Trash instance.
As Doug said, Presntly let dont distrurb the Trash class. So that no existing users will get effected.
Also, Currently Trash implementaions present in Common, So can we move this issue to common?
I'll change TrashDefault back Trash, and Trash to TrashPolicy. I think that should fix all concerns.
There should probably be a corresponding HDFS issue to update NameNode#startTrashEmptier() to use the new factory method, no?
Absolutely. I've already made that change. Just need to create a new issue and put up a patch.
This looks mostly reasonable to me. A few nits:
- The TrashPolicy#getInstance() implementations need javadoc comments.
- TrashPolicy#getEmptier() should perhaps not claim that, "only one checkpoint is kept", as not all implementations might implement that policy.
- Trash methods that override TrashPolicy methods should have an @Override and don't need javadoc unless it's different than that inherited from TrashPolicy.
I took a quick look at it. Some comments:
- Trash.java
- I prefer Trash uses/contains TrashPolicy insted Trash is TrashPolicy relationship. With this Trash does not need to have incompatible changes or deprecation. Trash could use TrashPolicyFactory to create a policy that it references. This also makes other code changes in this patch unnecessary.
- Why are you making some members non final? You could call from one constructor another constructor with this(...) and still keep it final without introducing initialize method?
- There are empty changes (with just space added).
- Indentation of Trash class definition is not quite right.
- TrashPolicy.java
- Can Trash#fs, Trash#homeParent, Trash#deletionInterval and any other members be common to all TrashPolicy and hence be moved to TrashPolicy?
- Move factory methods out of TrashPolicy to a separate class TrashPolicyFactory?
- What is the need to make TrashPolicy extend Configured?
- I am not sure what checkpoint() method really does - the javadoc is quite vague. Is the correct name snapshot for this? Is the right name for expunge() deleteSnapshots()?
- It does not look like getEmptier() method is needed? Just a start method on Trash should suffice right? Why expose emptier? Running the thread as part of Trash is better than trash policy?
One more nit adding to Cutting points:
Your patch contains some hard core tab characters, can you please correct?
try { - Trash trash = new Trash(home.getPath(), conf); + Trash trash = new Trash(home.getPath(), conf);
1. To preserve backward compatibility, I chose not to rename Trash to something like TrashPolicyDefault or something. If Trash uses/contains TrashPolicy then it doesn't make sense to have an implementation in it. I think it's best to separate implementation of a Trash policy from what the policy actually is. This is similar to how BlockPacementPolicy has been set up.
2. ReflectionUtils calls the no parameter constructor by default. Therefore we can only really "initialize" the instance after it been returned by ReflectionUtils.newInstance(...).
3. Trash was an instance of Configured, so I pushed that up to TrashPolicy. Another reason why this is useful is that ReflectionUtils does an internal check to see if the Class is an instanceof Configured and calls setConf(...). I don't think it's strictly needed though.
4. checkpoint(), expunge() and getEmptier() were part of the public interface exposed by Trash and since the interface was marked stable, renaming/removing these methods doesn't seem favorable right now. I also think eventually we should remove them from TrashPolicy because it doesn't make sense to expect every Trash policy to have a checkpoint mechanism.
5. If there's a startEmptier() method in Trash, we either fix it to run in a separate thread or in the same thread. By exposing a Runnable, we push this decision to the client which I think might be better.
I will fix the javadocs/indentation issues.
Moving factory methods from TrashPolicy to TrashPolicyFactory is something I'm open to. BlockPacementPolicy doesn't use a separate factory class, so maybe just for consistency reasons we can have them in TrashPolicy?
> To preserve backward compatibility, I chose not to rename Trash to something like TrashPolicyDefault or something. If Trash uses/contains TrashPolicy then it doesn't make sense to have an implementation in it. I think it's best to separate implementation of a Trash policy from what the policy actually is. This is similar to how BlockPacementPolicy has been set up.
I am not sure what you mean by this.
All I am suggesting is, use Trash as Facade. It reference TrashPolicy (base class) and points to the implementation of the TrashPolicy which provide specialization. Some what along the lines of -> http://en.wikipedia.org/wiki/Strategy_pattern.
Oh, I think I didn't get you the last time. Yeah I think adding a layer of indirection should definitely make this cleaner. We can rename checkpoint() and expunge() to create/deleteSnapshot() in TrashPolicy this way as well.
Though I'm just wondering if there are some clients which make some assumptions based on the implementation of Trash when they use it? This might break things for them.
I am glad you agree. By using Trash as Facade you can avoid the interface changes getting to the user.
> Though I'm just wondering if there are some clients which make some assumptions based on the implementation of Trash when they use it? This might break things for them.
Clearly articulating what Trash is and what it is not is a good start here. We could also in javadoc say, see the implementation of TrashPolicy for more details. I can help with you in this regard, in the next patch review.
Hi Usman,
New patch looks good to me...good job!
There are some points to address:
TrashPolicy.java
1. unused variable.
private static final Log LOG =
LogFactory.getLog(TrashPolicy.class);
Trash.java:
1. unused imports are there. can you please remove them?
2. I think previous Trash users can do setConf and getConf because it extends Configured. But now any way to handle it? TrashPolicy extended it, but as per my understanding User will deal with Trash only right.
3.Also unused log variable.
TrashPolicyDefault.java:
1. Here also please clean the unused imports.
Also after attaching the patch, you need to Submit it.
Uma, great point about Trash not extending Configured. Trash now extends Configured so that no unexpected behavior is seen by a client.
Removed unused vars/imports.
Can you Click on 'Submit Path' button.
So, that status will go from 'Open' to 'Patch Available' state. – Thanks
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12486738/PluggableTrash_V4.patch
against trunk revision 1147971.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 system test framework. The patch passed system test framework compile.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/743//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/743//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/743//console
This message is automatically generated.
Can you please address the findbugs comment?
Also can we add some tests by configuring new test Trash policy? --Thanks
Patch looks good. Should the members fs, trash, current, deletionInterval etc be moved to TrashPolicy? T
The problem is that the constructor of Configured calls setConf(...) which I've overridden in Trash so that it invokes setConf(...) for trashPolicy as well. But this leads to the bug that FindBug highlights. I think this error will occur as long as a constructor calls an overridden method. But it doesn't make sense for us to setConf(...) of Trash and not change it for the underlying TrashPolicy. Any ideas how to work around it?
We don't really have another TrashPolicy, should I just set the conf variable fs.trash.classname to TrashPolicyDefault and see if that works as well?
Another quick comment - Should TrashPolicy be InterfaceAudience.Public? It should still retain InterfaceStability.Evolving though.
I think fs, trash can be moved up to TrashPolicy. Not sure about current, homesParent, deletionInterval though. Some TrashPolicy might not require these vars?
@Suresh - Initially I thought making TrashPolicy InterfaceAudience.Private would be better, but thinking about it more I don't really see any reason why TrashPolicy should only be accessible via Trash. So yeah, I think we can make itInterfaceAudience.Public.
> I think fs, trash can be moved up to TrashPolicy. Not sure about current, homesParent, deletionInterval though. Some TrashPolicy might not require these vars?
They can ignore it then right?
> but thinking about it more I don't really see any reason why TrashPolicy should only be accessible via Trash
That was not the reason why I was thinking about making it public. If new trash policy can be implemented by users, then it should be a public API.
current and homesParent seem too tied up to TrashPolicyDefault to be moved up to TrashPolicy. I don't think any other TrashPolicy would make use of these two vars. I will move fs, trash and deletionInterval to TrashPolicy.
I will also make TrashPolicy InterfaceAudience.Public.
Does anyone have any ideas for how to resolve the FindBug issue? I explained it in a previous comment.
Configured class should not call non final method in it's constructor. The problem starts there.
Do you expect the conf for TrashPolicy to change post construction? If not then you do not need to override the setConf() in Trash. Just call setConf() for TrashPolicy, post it's construction.
I know what the issue is, I was asking for ideas for a solution. The problem is if someone calls setConf(...) on Trash post construction, then it should also change the conf for TrashPolicy, no? If this isn't needed, then I can just remove the overridden setConf(...) in Trash and the issue will be resolved.
Fixed the FindBug issues:
- Removed the overridden setConf(...) method from Trash.
- Added a test case to test a pluggable TrashPolicy.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12487397/PluggableTrash_V5.patch
against trunk revision 1148933.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 system test framework. The patch passed system test framework compile.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/758//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/758//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/758//console
This message is automatically generated.
Minor comments:
- Please add @Override to TestTrashPolicy methods.
- Make Trash#getTrashPolicy() package private.
Will commit the new patch.
Can't change Trash#getTrashPolicy() to private because it is used by TestTrash.java. Removed the public, however, and not it has no access modifier.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12487485/PluggableTrash_V6.patch
against trunk revision 1148933.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 system test framework. The patch passed system test framework compile.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/759//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/759//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/759//console
This message is automatically generated.
> Can't change Trash#getTrashPolicy() to private because it is used by TestTrash.java.
You can change it to package private not private. TestTrash and Trash are in the same package.
Integrated in Hadoop-Common-trunk-Commit #696 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/696/)
HADOOP-7460. Support pluggable trash policies. Contributed by Usman Masoon.
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1149760
Files :
- /hadoop/common/trunk/common/src/java/org/apache/hadoop/fs/TrashPolicy.java
- /hadoop/common/trunk/common/src/java/org/apache/hadoop/fs/Trash.java
- /hadoop/common/trunk/common/CHANGES.txt
- /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/fs/TestTrash.java
- /hadoop/common/trunk/common/src/java/org/apache/hadoop/fs/TrashPolicyDefault.java
Integrated in Hadoop-Common-trunk #755 (See https://builds.apache.org/job/Hadoop-Common-trunk/755/)
HADOOP-7460. Support pluggable trash policies. Contributed by Usman Masoon.
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1149760
Files :
- /hadoop/common/trunk/common/src/java/org/apache/hadoop/fs/TrashPolicy.java
- /hadoop/common/trunk/common/src/java/org/apache/hadoop/fs/Trash.java
- /hadoop/common/trunk/common/CHANGES.txt
- /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/fs/TestTrash.java
- /hadoop/common/trunk/common/src/java/org/apache/hadoop/fs/TrashPolicyDefault.java
The issue is to choose the right interface for pluggable Trash modules. Currently the public methods are:
The first two methods should be part of the Trash interface, but I'm not sure about the last two. Not every Trash policy should be required to implement a checkpoint mechanism.
Currently expunge() and checkpoint() are used by FsShell for the "-expunge" arg.