|
[
Permlink
| « Hide
]
Hemanth Yamijala added a comment - 25/Jun/09 09:55 AM
Initial proposal is to keep it dead simple:
There are atleast two issues that were discussed about this approach in HADOOP-5919:
With the project split over now, maybe the first issue requires a solution as part of this JIRA. Owen's suggestion was to define a key like hadoop.conf.extra.classes which would be a list of class names that will be loaded by Configuration when it is loaded. By default this could be null, but in a cluster installation, we could put up basic classes like JobConf. This would give an opportunity for the extra classes to add more mappings to the new keys. The second problem is a bit more involved, though some obvious solutions exist. And we could take the approach that we will not solve it in this patch, but only restrict the utility of this framework for the more straightforward mapping cases. Thoughts ? I'm not enamored of this approach and would like to propose
a slightly heavier-weight, but, I think, cleaner approach than stuffing more logic into the Configuration class. My apologies for coming to this conversation a bit late. If you don't want to read a long e-mail, skip down to the code examples Before I get to the proposal, I wanted to lay out what I think the goals
My proposal is to represent every configuration variable that's accessed in public interface ConfigValue<T> { T get(Configuration conf); T getDefault(); void set(Configuration conf, T value); String getHelp(); } There's more than one way to implement this. Here's one proposal that uses @ConfigDescription(help="Some help text", visibility=Visibility.PUBLIC) @ConfigAccessors({ @ConfigAccessor(name="common.sample"), @ConfigAccessor(name="core.sample", deprecated="Use common.sample instead") }) public final static ConfigVariable<Integer> myConfigVariable = ConfigVariables.newIntConfigVariable(15 /* default value */); This approach would require pre-processing (at build time) the annotations I'm half-way to getting this working, and I actually think something @ConfigVariableDeclaration public final static ConfigVariable<URI> fsDefaultName = ConfigVariableBuilder.newURI() .setDefault(null) .setHelp("Default filesystem") .setVisibility(Visibility.PUBLIC) .addAccessor("fs.default.name") .addDeprecatedAccessor("core.default.fs", "Use foo instead") .addValidator(new ValidateSupportedFilesystem()); This would still require build-time preprocessing (javac supports A drawback of this approach is how to handle the defaults that I think this can be implemented relatively quickly, with little impact on What do you think? Philip, this seems a very ambitious change to the Configuration framework. Before I can comment further, I would like to take a look at the exact JIRA where this is being discussed. Though I could find out a couple of JIRAs that seem related, none had all the points that you've mentioned here. Can you point me to such a jira, if it exists ?
If not, I think it is important to open a new JIRA to discuss these points. That way it will find a better audience and we can discuss better. Please let me know about this. Hemanth,
This JIRA is about backwards-compatibility of deprecated keys, which is something my comment addresses, so I thought it fit in well here. Think of it as an alternative solution to the problem you're trying to solve by keeping the map of deprecated keys in Configuration.java. Keeping a deprecation map is expedient and simple, but I think it may hamper a better, longer-term solution. The design goals above are "out of thin air" (in the sense that they haven't been discussed on JIRA outside of the JIRAs mentioned above and MAPREDUCE-475), though I hope they're reasonable. They were discussed a bit at http://wiki.apache.org/hadoop/DeveloperOffsite20090612 I very very much want there to be a path to be able to rename configuration keys, but I want to make sure that the solution that comes out of this JIRA is compatible with some future work. – Philip Phillip,
I think this is too much structure for what is gained. In particular, it replaces a relatively simple string to string map with a lot of code. Where does all of the code go? How does it interact with user's job conf? How do we document it? I think we should go ahead with the simple approach for now... Owen,
Apologies for missing this e-mail for so long. I'm behind on the "all-jira" bucket, and I failed to set a watch. Hemanth, you should definitely forge ahead with the simple, expedient solution. I'd like to convince you and Owen that the more complicated proposal is a net win (and I've used a similar system in the past), but I think the best way to do that is to actually write the code and transform a few usages. I've been busy with some other deadlines, so when I get there, I'll file a JIRA and bother you all again. (To answer Owen's questions: the couple of classes for ConfigVariable go into the configuration package; users are welcome to use the same classes to set their variables, or they can set them manually; the documentation for the variables themselves is generated, the documentation for the system lives in JavaDoc on the individual classes and the package.) – Philip Assumptions:
1. None of the *-default.xml files would have deprecated keys. ======== get(name): the key(name) is checked in the deprecation map. If present, set(name,value): the key (name) is checked in the deprecation map. If ========
Note: the set and unset in the above table refers to the keys being You might consider logging a warning every time you run into (either via get or set) a "set" deprecated key.
The first comment defines the initial proposal , Which says that when both new and old values are set in configuration we give preference
to new values. This we felt is not correct , we should give preference to old value.As we are trying to be backward compatible here. So in configuration if have both deprecated key and new key , new key's values would be replaced by the deprecated key's values. Now in taking the above consideration in mind we have 2 ways of solving this. 1.Try to maintain a single set of key value always .When we have both deprecated key and new key , only maintain new key .So when we are trying to read the Configuration xmls , if we encounter any deprecated key , we simply replace new key's value with the deprecated key's value. 2. Always give preference to the deprecated key . While set of new values check if deprecated keys are present , if yes , then simply set deprecated values. We would like to go ahead with option 1. Where we make sure that old is preferred at the start while loading the configuration , after that it To make the above proposals more clear:
for example: 1. Always maintain new keys only. At the time of loading configuration xml , if A and B are present , we simply , replace B's values with A's value.
2.Always give preference to deprecated keys. If we have both A and B present in configuration , give preference to A always.
Any comments ? I think you have to think about how Hadoop's notion of a "final" flag interacts with this, too. If a system administrator has set either A or B to be final, then that value must override any user-submitted value, regardless of which was set first.
Assumptions/Requirements:
The following table depicts the various cases arising, keeping in view whether the key in concern is final and if the key is deprecated: NOTE:
Uploading the patch
Changes look good ,
some minor comments: Configuration.java 1.Check for null in addDeprecation method. 3.Try to use foreach kind of syntax where ever possible.It makes code look simpler. TestConfiguration.java 1.In tearDown method , you should delete the test-config.xml file also 1.Diff has Index: junitvmwatcher8758949811953274592.properties Diff also has test-config3.xml Uploaded the patch with the above suggested modifications done.
Some changes made in the new patch are:
Took a look at the patch file:
The code changes looks fine. Following are the comments with respect to test case: In testcase do assert that get of the old and new key matches correctly. We just need to do the following for these cases. Add test case to check user code case also, i.e the configuration resource has only new keys. Then you can add deprecated keys using addDeprecation and then do get set and get of old keys. Uploading the new patch with the above suggestions being implemented.
uploaded patch with the following test cases being added:
1. User sets an old key and gets new key. 2. User sets a new key and gets old key. The results should be the value which is set most recently, i.e., value corresponding to old key in the first case and value corresponding to new key in the second case. Took a look at the lastest patch, the last test condition i.e. simulation of the user setting deprecated value and framework using new value is not being asserted, can you please add the assert condition there?
Apart from that all other changes look fine. Uploading the patch with the above mentioned correction done.
The changes looks fine to me.
+1 to the patch. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12416391/HADOOP-6105-4.patch against trunk revision 804918. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/609/testReport/ This message is automatically generated. uploading the patch with the findbugs warning fixed.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12416861/HADOOP-6105-5.patch against trunk revision 804918. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/611/testReport/ This message is automatically generated. The previous patch does not handle deprecations for mapreduce and hdfs. In order to support the deprecations in mapreduce and hdfs, a new key hadoop.conf.extra.classes is being added in core-site.xml. This key can have values as class names of classes from mapreduce and hdfs which extend Configuration and whose static blocks can have deprecations by using addDeprecation method. These classes will be loaded using reflection, thus ensuring that deprecations added in their static blocks are being considered.
Uploading a patch with the above changes made. Took a look at the patch following are the comments which I had on the same:
We should also take care that deprecated keys are not going to be reloaded and reprocessed with Configuration.reload() method.
Serious enough to log in WARN mode actually. Suppressing exception is fine, I suppose.
Changes introduced in the new patch:
The changes look fine to me.
The changes look mostly fine. I have just a few minor comments:
Uploading patch with the above comments implemented.
The warning message will be printed for every set and also for first get of deprecated key after every reload of configuration. +1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12418793/HADOOP-6105-10.patch against trunk revision 812031. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/20/testReport/ This message is automatically generated. Integrated in Hadoop-Common-trunk-Commit #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/15/
. Adds support for automatically handling deprecation of configuration keys. Contributed by V.V.Chaitanya Krishna. I committed this to trunk. Thanks, Chaitanya !
Integrated in Hadoop-Common-trunk #89 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/89/
. Adds support for automatically handling deprecation of configuration keys. Contributed by V.V.Chaitanya Krishna. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||