|
This is not a blocker for 0.17.0 (for which a release candidate is being voted on). Moving to 0.18
Did these counters work with Ganglia in a prior release? If so, then this is a regression and might be a candidate for 0.17.1...
They did not work correctly in 0.15.3 or in 0.16.0. It is a very minor change and I think people would like it.
The only other issue i currently have is that i have to set up my hadoop-metrics.properties files such that the report to node: The names being produced for ganglia are long and tend to overflow the bounding box in the graph.
mapred.job.FQCN.COUNTER_NAME where FQCN is the fully qualified class that contains the counter ie: org.apache.hadoop.mapred.XXX.YYY And it would be nice to get this from the individual machines as well as the cluster agreggate so you could tell at a glance how the individual machines were doing. I suspect that will be annother patch for that new feature. +1. This patch is very helpful. Are there any unit tests that tests the HadoopMetrics API?
Not as of now. Have opened a JIRA HADOOP-3634 for this. Should be easy if we model something along the lines of FileContext. Trunk has changed, would it be possible to regenerate this patch? We can make it PA once done.
Patch against more recent trunk (rev 701680). Combined the existing patch with the one submitter for #
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12391475/ganglia-patch-3422-and-4137.patch against trunk revision 701476. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3433/testReport/ This message is automatically generated. Updated patch: Silly me, Ganglia has no long data type - and I mapped it to int32, which overflows quickly (especially when reporting bytes transferred!).
This now maps Long to float in Ganglia. I took a look at the patch. You have this in a few places '// TODO: Log this.'. Why not log the exception?
You keep a ThreadLocal StringBuffer to save on construction cost? Could you use a StringBuilder instead of a StringBuffer? Otherwise, this patch is great. I tried it. Ganglia graphs work with this in place. Hey stack,
1) I didn't add the log statements because I don't know the "preferred" way to add a logger to classes for this project. I hoped someone who knew would come along and help out. For now, use the original, not the updated patch - having the buffer overflow is better than nothing at all with floats. Good stuff Brian.
1). Just by way of FYI, see MetricsUtil in same package for how to add logging. Import two classes, then just inside your class opening, add something like: private static final Log LOG = LogFactory.getLog("org.apache.hadoop.util.GangliaContext"); In code, do something like this: LOG.warn("Type is null"); ... and so on. 2). Let me write Jason and see what he was up to with the ThreadLocal. I don't understand. Its not keeping any state so why bother with it. This patch removes the thread local code and uses StringBuilder instead of StringBuffer.
The core ganglia pieces are unchanged. At this point an outstanding issue is that it is not possible to distinguish the metrics from different jvms. What is the status of this issue and
The attached patch does the following:
1) Removes the TODO lines by adding the logger. 2) Incorporates 3) Includes Jason's updated patch. Builds cleanly against trunk. Couple of nitpicks: Use StringBuilder instead of StringBuffer and the set of StringBuffer length to zero on line #47 is not needed since you just created a new instance. Also, make comment on line #45 < 80 characters. Otherwise, I tried the patch against 0.19.0. All works.
What you think we should do about rpc and jvm 'record' names? They both use 'metrics'. Its kinda useless. Means all rpc and jvm metrics get lumped together. Other metrics records are better named: e.g. 'datanode', 'namenode', etc. Should we fix this as part of this patch or in a new issue? The jvm metrics record might be named for the application its hosting. And rpc the same? -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394071/ganglia-patch-3422-updated.patch against trunk revision 719152. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3611/testReport/ This message is automatically generated. Seems that the failed test, TestFsck, is not robust. There was another failing in
Version of patch that addresses the above comments (Uses StringBuilder instead of StringBuffer, removes extraneous setLength(0), etc.). Also adds overload of JVMMetrics#init method so can pass name of record to use. Default is to use the old 'metrics' name. Overload makes it so can pass in alternate record name. This makes it so can have metrics per jvm rather than have all jvm metrics lumped together. For example, in hbase, we can now have graphs for both the master and regionserver's jvms show in ganglia. Doing similar for rpc didn't make as much sense since method names are shown (and override of RpcMetrics creation is not possible currently).
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394440/ganglia-patch-3422-updated-v2.patch against trunk revision 719787. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3635/testReport/ This message is automatically generated. I set HADOOP-4675 to depend on this, as we can't start supporting Ganglia 3.1 until these basic Ganglia issues are resolved.
I like stack's refinements. Am I allowed to vote +1 on it if I helped push the patch along? I guess probably not... > Am I allowed to vote +1 on it if I helped push the patch along? I guess probably not...
Yes, of course you are allowed to vote. In the case of a dispute, PMC votes trump contributors, but everyone is encouraged to vote, and their opinions are taken seriously. Edit release note for publication.
jvm.metrics.* instead of just using the last token name, or in the case of Per Job Counters, the string value, for all counters. The net result of this change is that the metrics names make it clear which part of hadoop is reporting, and the counter values each have their own RRD file instead of all the counters being aggregated under the name "value" |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Modifies JobInProgress and GangliaContext