Issue Details (XML | Word | Printable)

Key: HADOOP-3422
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: stack
Reporter: Jason
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Ganglia counter metrics are all reported with the metric name "value", so the counter values can not be seen

Created: 20/May/08 05:35 PM   Updated: 23/Apr/09 07:17 PM
Return to search
Component/s: metrics
Affects Version/s: 0.17.2, 0.18.0, 0.18.1, 0.18.2
Fix Version/s: 0.20.0

Time Tracking:
Original Estimate: 1h
Original Estimate - 1h
Remaining Estimate: 1h
Remaining Estimate - 1h
Time Spent: Not Specified
Remaining Estimate - 1h

File Attachments:
  Size
Text File Licensed for inclusion in ASF works diff-20080520-1025.txt 2008-05-20 05:36 PM Jason 3 kB
Text File Licensed for inclusion in ASF works ganglia-patch-3422-and-4137.patch 2008-10-07 11:19 PM Brian Bockelman 4 kB
Text File Licensed for inclusion in ASF works ganglia-patch-3422-and-4137.patch 2008-10-04 06:18 PM Brian Bockelman 4 kB
Text File Licensed for inclusion in ASF works ganglia-patch-3422-qand-4137-1.patch 2008-10-20 04:56 AM Jason 2 kB
Text File Licensed for inclusion in ASF works ganglia-patch-3422-updated-v2.patch 2008-11-21 07:07 PM stack 5 kB
Text File Licensed for inclusion in ASF works ganglia-patch-3422-updated.patch 2008-11-17 04:55 PM Brian Bockelman 3 kB
Issue Links:
Dependants
 
Duplicate
 
Reference
 

Hadoop Flags: Reviewed, Incompatible change
Release Note: Changed names of ganglia metrics to avoid conflicts and to better identify source function.
Resolution Date: 25/Nov/08 10:41 PM


 Description  « Hide
The JobInProgress class reports all metrics with the name "value". The FileMetrics class puts all of the tags into the name when reporting the individual values, but the Ganglia Context does not put the tags into the name..

This patch modifies the context to build names for the counter metrics out of the tag values. This enables the user to see the indivdual counter values with the ganglia web tool, on a per job basis



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Jason added a comment - 20/May/08 05:36 PM
A patch against trunk to provide this.
Modifies JobInProgress and GangliaContext

Nigel Daley added a comment - 20/May/08 05:40 PM
This is not a blocker for 0.17.0 (for which a release candidate is being voted on). Moving to 0.18

Doug Cutting added a comment - 20/May/08 09:30 PM
Did these counters work with Ganglia in a prior release? If so, then this is a regression and might be a candidate for 0.17.1...

Jason added a comment - 20/May/08 09:44 PM
They did not work correctly in 0.15.3 or in 0.16.0. It is a very minor change and I think people would like it.

The only other issue i currently have is that i have to set up my hadoop-metrics.properties files such that the report to node:
jvm.servers=HOST:8649 ..., where HOST is the head node, if i use localhost, the metrics end up all in a bucket under the psuedo machine localhost.localdomain
If the HOST is a real host, they show up under the node's statistics


Jason added a comment - 21/May/08 02:18 PM
The names being produced for ganglia are long and tend to overflow the bounding box in the graph.
mapred.job.FQCN.COUNTER_NAME
where FQCN is the fully qualified class that contains the counter ie: org.apache.hadoop.mapred.XXX.YYY

And it would be nice to get this from the individual machines as well as the cluster agreggate so you could tell at a glance how the individual machines were doing. I suspect that will be annother patch for that new feature.


dhruba borthakur added a comment - 29/May/08 05:25 PM
+1. This patch is very helpful. Are there any unit tests that tests the HadoopMetrics API?

Lohit Vijayarenu added a comment - 24/Jul/08 03:41 PM

Are there any unit tests that tests the HadoopMetrics API?

Not as of now. Have opened a JIRA HADOOP-3634 for this. Should be easy if we model something along the lines of FileContext.


Lohit Vijayarenu added a comment - 24/Jul/08 08:02 PM
Trunk has changed, would it be possible to regenerate this patch? We can make it PA once done.

Brian Bockelman added a comment - 04/Oct/08 06:18 PM
Patch against more recent trunk (rev 701680). Combined the existing patch with the one submitter for #HADOOP-4137; without both, Ganglia metrics appear to be broken.

Hadoop QA added a comment - 04/Oct/08 11:28 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12391475/ganglia-patch-3422-and-4137.patch
against trunk revision 701476.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3433/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3433/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3433/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3433/console

This message is automatically generated.


Brian Bockelman added a comment - 07/Oct/08 11:19 PM
Updated patch: Silly me, Ganglia has no long data type - and I mapped it to int32, which overflows quickly (especially when reporting bytes transferred!).

This now maps Long to float in Ganglia.


stack added a comment - 10/Oct/08 06:00 AM
I took a look at the patch. You have this in a few places '// TODO: Log this.'. Why not log the exception?

You keep a ThreadLocal StringBuffer to save on construction cost? Could you use a StringBuilder instead of a StringBuffer?

Otherwise, this patch is great. I tried it. Ganglia graphs work with this in place.


Brian Bockelman added a comment - 11/Oct/08 03:26 AM
Hey stack,

1) I didn't add the log statements because I don't know the "preferred" way to add a logger to classes for this project. I hoped someone who knew would come along and help out.
2) I don't know the logic behind the ThreadLocal StringBuffer stuff. Adoption from the original patch.
3) The update to the patch which maps Long to float is bust. The class doesn't have a way to serialize floats with XDR, so I'm not sure what would happen.

For now, use the original, not the updated patch - having the buffer overflow is better than nothing at all with floats.


stack added a comment - 11/Oct/08 04:38 AM
Good stuff Brian.

1). Just by way of FYI, see MetricsUtil in same package for how to add logging. Import two classes, then just inside your class opening, add something like:

private static final Log LOG =
    LogFactory.getLog("org.apache.hadoop.util.GangliaContext");

In code, do something like this: LOG.warn("Type is null"); ... and so on.

2). Let me write Jason and see what he was up to with the ThreadLocal. I don't understand. Its not keeping any state so why bother with it.
3). Doesn't your patch also address HADOOP-4137 whereas diff-20080520-1025.txt does not?


Jason added a comment - 20/Oct/08 04:56 AM
This patch removes the thread local code and uses StringBuilder instead of StringBuffer.

The core ganglia pieces are unchanged.

At this point an outstanding issue is that it is not possible to distinguish the metrics from different jvms.


Chris Douglas added a comment - 13/Nov/08 11:43 PM
What is the status of this issue and HADOOP-4137? The latter patch has been absorbed into the one posted here, but the two fixes appear independent (and the latest conflict). I'm canceling both patches for now, since it is unclear whether either is to be committed.

Brian Bockelman added a comment - 17/Nov/08 04:55 PM
The attached patch does the following:
1) Removes the TODO lines by adding the logger.
2) Incorporates HADOOP-4137. I'll mark that one as a duplicate of this.
3) Includes Jason's updated patch.

Builds cleanly against trunk.


stack added a comment - 18/Nov/08 06:52 PM
Couple of nitpicks: Use StringBuilder instead of StringBuffer and the set of StringBuffer length to zero on line #47 is not needed since you just created a new instance. Also, make comment on line #45 < 80 characters. Otherwise, I tried the patch against 0.19.0. All works.

What you think we should do about rpc and jvm 'record' names? They both use 'metrics'. Its kinda useless. Means all rpc and jvm metrics get lumped together. Other metrics records are better named: e.g. 'datanode', 'namenode', etc. Should we fix this as part of this patch or in a new issue?

The jvm metrics record might be named for the application its hosting. And rpc the same?


Hadoop QA added a comment - 20/Nov/08 07:38 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394071/ganglia-patch-3422-updated.patch
against trunk revision 719152.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3611/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3611/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3611/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3611/console

This message is automatically generated.


Tsz Wo (Nicholas), SZE added a comment - 20/Nov/08 11:27 PM
Seems that the failed test, TestFsck, is not robust. There was another failing in HADOOP-4647. I believe both of the failing cases are not related to the patches posted.

stack added a comment - 20/Nov/08 11:33 PM
I agree the test failure is unrelated but would like to post another patch anyways to address comments above.

stack added a comment - 21/Nov/08 07:07 PM
Version of patch that addresses the above comments (Uses StringBuilder instead of StringBuffer, removes extraneous setLength(0), etc.). Also adds overload of JVMMetrics#init method so can pass name of record to use. Default is to use the old 'metrics' name. Overload makes it so can pass in alternate record name. This makes it so can have metrics per jvm rather than have all jvm metrics lumped together. For example, in hbase, we can now have graphs for both the master and regionserver's jvms show in ganglia. Doing similar for rpc didn't make as much sense since method names are shown (and override of RpcMetrics creation is not possible currently).

stack added a comment - 21/Nov/08 07:07 PM
Try hudson w/ new version of patch.

Hadoop QA added a comment - 23/Nov/08 01:46 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394440/ganglia-patch-3422-updated-v2.patch
against trunk revision 719787.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3635/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3635/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3635/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3635/console

This message is automatically generated.


Brian Bockelman added a comment - 24/Nov/08 02:43 PM
I set HADOOP-4675 to depend on this, as we can't start supporting Ganglia 3.1 until these basic Ganglia issues are resolved.

I like stack's refinements. Am I allowed to vote +1 on it if I helped push the patch along? I guess probably not...


Doug Cutting added a comment - 24/Nov/08 05:35 PM
> Am I allowed to vote +1 on it if I helped push the patch along? I guess probably not...

Yes, of course you are allowed to vote. In the case of a dispute, PMC votes trump contributors, but everyone is encouraged to vote, and their opinions are taken seriously.


stack added a comment - 24/Nov/08 06:59 PM
Thanks for the review Brian.

Unless objection, I'll commit to TRUNK in the next day or so.

Should I commit to branch-0.19 too? I'd like to but its only a 'major' bug, not critical nor a blocker.


stack added a comment - 25/Nov/08 10:41 PM
Committed. Thanks for the patch Jason and Brian.

Robert Chansler added a comment - 03/Mar/09 07:00 PM
Edit release note for publication.

jvm.metrics.*
mapred.job.Job Counters .*
mapred.job.Map-Reduce Framework.*
mapred.job.Map-Reduce Framework.*
mapred.job.Map-Reduce Framework.*
mapred.jobtracker.*
mapred.shuffleInput.*
mapred.tasktracker.*

instead of just using the last token name, or in the case of Per Job Counters, the string value, for all counters.

The net result of this change is that the metrics names make it clear which part of hadoop is reporting, and the counter values each have their own RRD file instead of all the counters being aggregated under the name "value"