Issue Details (XML | Word | Printable)

Key: HADOOP-1915
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Tom White
Reporter: Alejandro Abdelnur
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

adding counters methods using String (as opposed to Enum)

Created: 18/Sep/07 04:26 PM   Updated: 08/Jul/09 04:52 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works hadoop-1915-v10.patch 2008-05-07 04:21 PM Tom White 17 kB
Text File Licensed for inclusion in ASF works hadoop-1915-v2.patch 2008-03-24 02:52 PM Tom White 10 kB
Text File Licensed for inclusion in ASF works hadoop-1915-v3.patch 2008-04-01 09:26 AM Tom White 10 kB
Text File Licensed for inclusion in ASF works hadoop-1915-v4.patch 2008-04-02 09:22 PM Tom White 13 kB
Text File Licensed for inclusion in ASF works hadoop-1915-v5.patch 2008-04-04 10:34 AM Tom White 13 kB
Text File Licensed for inclusion in ASF works hadoop-1915-v6.patch 2008-04-04 10:53 AM Tom White 13 kB
Text File Licensed for inclusion in ASF works hadoop-1915-v7.patch 2008-04-04 11:47 PM Owen O'Malley 13 kB
Text File Licensed for inclusion in ASF works hadoop-1915-v8.patch 2008-04-16 02:15 PM Tom White 13 kB
Text File Licensed for inclusion in ASF works hadoop-1915-v9.patch 2008-04-21 11:28 AM Tom White 13 kB
Text File Licensed for inclusion in ASF works hadoop-1915.patch 2008-03-13 01:02 PM Tom White 10 kB
Environment: all
Issue Links:
Blocker
Reference
 

Hadoop Flags: Reviewed
Release Note: Provided a new method to update counters. "incrCounter(String group, String counter, long amount)"
Resolution Date: 22/May/08 04:54 PM


 Description  « Hide
Currently to use the counters from within Map/Reduce code Enums have to be used, the Enum class defines the group and the Enum itself the counter. Internally they are converted to Strings (the class name and the enum toString) and you can retrieve them as strings from the client API.

Using dynamic counters (driven by configuration of the map/reduce) is not easy with the counters Enum based API. For example, currently I have an Enum class with 50 enums and we have to map the cardinality to the counter name on the client. This is cumbersome.

This could be easily improve by adding a String based counter method increment(String group, String counter, long count) to allow use of the counters without Enums.

Internally this method already exists, so the changes are minimal.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
David Bowen added a comment - 18/Sep/07 05:15 PM
Couldn't you write a simple subclass of Enum with a static initialization method which reads the configuration and generates the Enum instances? It could also create a static hashmap so that the elements can be looked up by string.

I think that there would be a problem with the API you suggest, because the enum ordinal values would have to be generated on the fly, and we want them to be the same in different processes (as counters are transmitted between Java processes).


Owen O'Malley added a comment - 18/Sep/07 09:27 PM
Actually, this is necessary for pipes too. I think the enum interface is better in general, but there are a couple of use cases where strings are just easier to deal with.

Alejandro Abdelnur added a comment - 19/Sep/07 04:51 PM
As far as I know you can not subclass Enum. An Enum 'constants' are constants, cannot be added.

The trick I've done until now is to have an Enum class like the following:

---------------------
public enum X {
A,B,C, D, E, F;

private String name;

public String toString() { return name; }

private static int next = 0;

public static synchronized Enum getEnum(String name) { X x = X.values()[next++]; x.name = name; return x; }

}
---------------------

Everytime I need a new counter I invoke getEnum($NAME$).

By doing this I can set to the Enum the dynamic name and then from the running job I just get it by the dynamic name.

The problem i see now is the ordinal. I was not aware until your response that it is being use for collection instead the string value of the counter. as the counters are not declared up front but added dynamically it can be the case that in different tasks they are used in different order.

So, to be able to use String the ordinal would have to be removed from the serialized counter. Or, when aggregating counters from different task the names should be used instead the ordinal.


Tom White added a comment - 13/Mar/08 01:02 PM
I've created a patch for this. The main change is the removal of ordinal values for storing Counters in a Group. To do this I've replaced the sparse list of Counters (where the index is the ordinal) by a map of Counter name to Counter. I've also changed the Writable format of a Counter to send the non-localized name (rather than the localized name). Localization is done after deserialization.

In terms of public API changes I have added the following method to Reporter:

public abstract void incrCounter(String group, String counter, long amount);

and the corresponding method to Counters. Methods in Counters that take an ordinal have been deprecated.


Hadoop QA added a comment - 14/Mar/08 05:56 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377780/hadoop-1915.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1961/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1961/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1961/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1961/console

This message is automatically generated.


Owen O'Malley added a comment - 17/Mar/08 09:20 PM
Tom,
Your patch is a lot more expensive in terms of cpu time, especially in the merge of counters between tasks that is done in the Job Tracker. The ids for each counter were used to speed up access within each group. It probably makes sense to use a tree map instead of a hash map, so that merges between sets of counters can use the sorted nature of tree maps...

Of course, just adding the string-based method in the reporter with the ids would have been even easier. smile


Tom White added a comment - 24/Mar/08 12:49 PM
I've just profiled the patch to compare it to trunk and I don't see any speed penalty.

I ran a sort in the local job runner and got 118.4s with trunk vs. 118.6s with the patch, averaged across 5 runs. Using a sampling profiler, I measured 5ms of time in the Counters class with trunk, and <1ms with the patch.

Of course, just adding the string-based method in the reporter with the ids would have been even easier.

I wanted to keep the ordinal out of the public API, to make it easier to use, and potentially less error-prone (mixing up ordinals).

tests included -1. The patch doesn't appear to include any new or modified tests.

Please justify why no tests are needed for this patch.

The changes are tested by existing counters tests.


Tom White added a comment - 24/Mar/08 02:52 PM
Minor change to patch so it applies cleanly to trunk.

Hadoop QA added a comment - 24/Mar/08 05:30 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12378489/hadoop-1915-v2.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2038/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2038/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2038/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2038/console

This message is automatically generated.


Owen O'Malley added a comment - 26/Mar/08 11:27 PM
Sort wouldn't show the problem at all. You would need to test with a job that is actually using counters. Try random writer with a small value size (0 bytes).

Tom White added a comment - 27/Mar/08 10:41 AM
With random writer with 0 length values it takes 46s with trunk and 46s with the patch (averaged over 3 runs). With a profiler I see that for trunk, it is spending 2% (716ms average) of the time in the Counters code (in Counters.findCounter). For the patch, it is spending 1% (485ms average) of the time in the Counters code (in Counters.findCounter).

Based on this, the patch looks OK.


Tom White added a comment - 01/Apr/08 09:26 AM
Synced with trunk.

Owen O'Malley added a comment - 01/Apr/08 07:55 PM
ok. I'm surprised, but that is why we check things. smile

Owen O'Malley added a comment - 01/Apr/08 08:06 PM
Tom, it looks like your code won't work with counters that don't have their property file on the JobTracker. Is my reading correct? That should work, otherwise users won't be able to make nice counters in their applications.

Tom White added a comment - 02/Apr/08 09:22 PM
Owen, you're right - the patch doesn't work properly with localization. Thanks for catching that.

I've created a new patch to deal with this by changing the external format of Counter to include displayName, but only if it differs from name:

name (false | true displayName) value

(The reason that we can't just use displayName is that we need name as the identifier - in the absence of ordinal - and there is no way of going from displayName to name.)

I've just tested this and localization is still not working - so this needs some more investigation before it's ready.


Enis Soztutar added a comment - 04/Apr/08 09:05 AM
Canceling the patch, since it is under discussion.

Tom White added a comment - 04/Apr/08 10:34 AM
New patch that works with localization. This was also a problem in trunk for the group name which was not being localized correctly. Counters#incrAllCounters was not preserving the display name.

Tom White added a comment - 04/Apr/08 10:53 AM
Patch that cleanly applies to trunk.

Hadoop QA added a comment - 04/Apr/08 05:34 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379375/hadoop-1915-v6.patch
against trunk revision 643282.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

patch -1. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2164/console

This message is automatically generated.


Owen O'Malley added a comment - 04/Apr/08 11:47 PM
Fixed with respect to trunk.

Hadoop QA added a comment - 05/Apr/08 02:01 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379451/hadoop-1915-v7.patch
against trunk revision 643282.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2171/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2171/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2171/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2171/console

This message is automatically generated.


Nigel Daley added a comment - 07/Apr/08 04:28 AM
This feature missed 0.17 feature freeze.

Tom White added a comment - 16/Apr/08 02:15 PM
New patch (v8) to fix the FindBugs warning. The test failure was unrelated to this patch.

Hadoop QA added a comment - 16/Apr/08 05:34 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380282/hadoop-1915-v8.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2255/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2255/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2255/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2255/console

This message is automatically generated.


Tom White added a comment - 21/Apr/08 11:28 AM
Another attempt to fix the FindBugs warning.

Hadoop QA added a comment - 22/Apr/08 08:19 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380597/hadoop-1915-v9.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2293/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2293/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2293/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2293/console

This message is automatically generated.


Owen O'Malley added a comment - 01/May/08 10:42 PM
I'm sorry for canceling this patch again, but it seems like it really should include a test that calls the new code.

Tom White added a comment - 07/May/08 04:21 PM
Patch with unit test for user defined counters.

Hadoop QA added a comment - 07/May/08 11:36 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381606/hadoop-1915-v10.patch
against trunk revision 654292.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2421/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2421/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2421/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2421/console

This message is automatically generated.


Hudson added a comment - 08/May/08 12:23 PM

Owen O'Malley added a comment - 22/May/08 04:54 PM
I forgot to resolve this bug. Thanks, Tom!