|
Actually, this is necessary for pipes too. I think the enum interface is better in general, but there are a couple of use cases where strings are just easier to deal with.
As far as I know you can not subclass Enum. An Enum 'constants' are constants, cannot be added.
The trick I've done until now is to have an Enum class like the following: --------------------- private String name; public String toString() { return name; } private static int next = 0; public static synchronized Enum getEnum(String name) { X x = X.values()[next++]; x.name = name; return x; } } Everytime I need a new counter I invoke getEnum($NAME$). By doing this I can set to the Enum the dynamic name and then from the running job I just get it by the dynamic name. The problem i see now is the ordinal. I was not aware until your response that it is being use for collection instead the string value of the counter. as the counters are not declared up front but added dynamically it can be the case that in different tasks they are used in different order. So, to be able to use String the ordinal would have to be removed from the serialized counter. Or, when aggregating counters from different task the names should be used instead the ordinal. I've created a patch for this. The main change is the removal of ordinal values for storing Counters in a Group. To do this I've replaced the sparse list of Counters (where the index is the ordinal) by a map of Counter name to Counter. I've also changed the Writable format of a Counter to send the non-localized name (rather than the localized name). Localization is done after deserialization.
In terms of public API changes I have added the following method to Reporter: public abstract void incrCounter(String group, String counter, long amount); and the corresponding method to Counters. Methods in Counters that take an ordinal have been deprecated. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377780/hadoop-1915.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1961/testReport/ This message is automatically generated. Tom,
Your patch is a lot more expensive in terms of cpu time, especially in the merge of counters between tasks that is done in the Job Tracker. The ids for each counter were used to speed up access within each group. It probably makes sense to use a tree map instead of a hash map, so that merges between sets of counters can use the sorted nature of tree maps... Of course, just adding the string-based method in the reporter with the ids would have been even easier. smile I've just profiled the patch to compare it to trunk and I don't see any speed penalty.
I ran a sort in the local job runner and got 118.4s with trunk vs. 118.6s with the patch, averaged across 5 runs. Using a sampling profiler, I measured 5ms of time in the Counters class with trunk, and <1ms with the patch.
I wanted to keep the ordinal out of the public API, to make it easier to use, and potentially less error-prone (mixing up ordinals).
The changes are tested by existing counters tests. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12378489/hadoop-1915-v2.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2038/testReport/ This message is automatically generated. Sort wouldn't show the problem at all. You would need to test with a job that is actually using counters. Try random writer with a small value size (0 bytes).
With random writer with 0 length values it takes 46s with trunk and 46s with the patch (averaged over 3 runs). With a profiler I see that for trunk, it is spending 2% (716ms average) of the time in the Counters code (in Counters.findCounter). For the patch, it is spending 1% (485ms average) of the time in the Counters code (in Counters.findCounter).
Based on this, the patch looks OK. ok. I'm surprised, but that is why we check things. smile
Tom, it looks like your code won't work with counters that don't have their property file on the JobTracker. Is my reading correct? That should work, otherwise users won't be able to make nice counters in their applications.
Owen, you're right - the patch doesn't work properly with localization. Thanks for catching that.
I've created a new patch to deal with this by changing the external format of Counter to include displayName, but only if it differs from name: name (false | true displayName) value (The reason that we can't just use displayName is that we need name as the identifier - in the absence of ordinal - and there is no way of going from displayName to name.) I've just tested this and localization is still not working - so this needs some more investigation before it's ready. Canceling the patch, since it is under discussion.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379375/hadoop-1915-v6.patch against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2164/console This message is automatically generated. Fixed with respect to trunk.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379451/hadoop-1915-v7.patch against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2171/testReport/ This message is automatically generated. This feature missed 0.17 feature freeze.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380282/hadoop-1915-v8.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2255/testReport/ This message is automatically generated. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380597/hadoop-1915-v9.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2293/testReport/ This message is automatically generated. I'm sorry for canceling this patch again, but it seems like it really should include a test that calls the new code.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381606/hadoop-1915-v10.patch against trunk revision 654292. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2421/testReport/ This message is automatically generated. Integrated in Hadoop-trunk #484 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/484/
I forgot to resolve this bug. Thanks, Tom!
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think that there would be a problem with the API you suggest, because the enum ordinal values would have to be generated on the fly, and we want them to be the same in different processes (as counters are transmitted between Java processes).