Issue Details (XML | Word | Printable)

Key: HADOOP-4675
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Brian Bockelman
Reporter: Brian Bockelman
Votes: 3
Watchers: 12
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Current Ganglia metrics implementation is incompatible with Ganglia 3.1

Created: 18/Nov/08 04:41 AM   Updated: 16/Sep/09 02:58 PM
Return to search
Component/s: metrics
Affects Version/s: 0.21.0
Fix Version/s: 0.21.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works hadoop-4675-2.patch 2008-11-25 02:14 AM Brian Bockelman 14 kB
Text File Licensed for inclusion in ASF works hadoop-4675-3.patch 2008-11-25 12:12 PM zhangwei 10 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-4.patch 2008-12-23 05:50 AM zhangwei 9 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v5.patch 2009-03-27 08:46 PM Brian Bockelman 14 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v6.patch 2009-03-29 12:45 AM Brian Bockelman 20 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v7.patch 2009-05-05 02:22 AM Brian Bockelman 16 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v8.patch 2009-08-19 06:46 AM Scott Beardsley 15 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v9.patch 2009-09-04 12:40 AM Matt Massie 15 kB
Text File Licensed for inclusion in ASF works hadoop-4675.patch 2008-11-24 02:39 PM Brian Bockelman 14 kB
Issue Links:
Dependants
 

Release Note: Support for reporting metrics to Ganglia 3.1 servers
Tags: ganglia metrics


 Description  « Hide
Ganglia changed its wire protocol in the 3.1.x series; the current implementation only works for 3.0.x.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Brian Bockelman added a comment - 18/Nov/08 04:45 AM
Looking at the new Ganglia wire protocol, we'll probably need to do the following:
1) Implement more XDR serializations. 3.0.x protocol sends all data as strings; 3.1.x correctly sends float data as XDR floats.
2) Test and verify sending Longs as floats in Ganglia actually works. This has been discussed, but I'm not sure it's entirely functional... maybe get a test case going?
3) Probably this will result in a new GangliaContext31 class, as I don't see any clean way to merge the two protocols together into one class.
  • The patch for HADOOP-3422 caused a new metric name to be generated in the emitRecord class; we could reuse this if it was separated out into a new function.

Brian


Brian Bockelman added a comment - 24/Nov/08 02:39 PM
The attached patch (which includes the patch for HADOOP-3422 in order to make things work...) should provide Ganglia 3.1 support.

Note that this patch probably will be conflicting in Hudson until HADOOP-3422 lands in trunk. I'll rebase things then.


Brian Bockelman added a comment - 25/Nov/08 02:14 AM
Fix a build issue.

zhangwei added a comment - 25/Nov/08 10:24 AM
I test the patch OK.And I suggest to add the group attribute in the extra data
the metric in Ganglia3.1.* has the attribute,so i think to put the prefix of the metric name to the group name.
for example:
metric :dfs.namenode.FilesCreated --> group: dfs.namenode

zhangwei added a comment - 25/Nov/08 11:23 AM - edited
hadoop-4675-3.patch :add group attribute which use the prefix of the metric name.

Brian Bockelman added a comment - 25/Nov/08 03:57 PM
Hey zhangwei,

The group attribute stuff is a good improvement. Have you tested it on your cluster? I'll perhaps get to do this today or tomorrow; we're going to be upgrading to Hadoop 0.19.0 soon, which will allow me to slip a few new patches in.

Brian


zhangwei added a comment - 26/Nov/08 02:15 AM
Hi, Brian

I tested it on my cluster,which has ganglia 3.1.1 installed.I add a little code simply based on your patch:

+ String GroupName = name.substring(0,name.lastIndexOf("."))
. . . . . .
xdr_int(dmax); // dmax, the maximum data value
//xdr_string(""); // Empty extra_value field for Ganglia 3.1.
+ xdr_int(1); //this means Num Of the Element follow by
+ xdr_string("GROUP");
+ xdr_string(GroupName);
for (SocketAddress socketAddress : metricsServers) {
DatagramPacket packet

And it seems OK.


Brian Bockelman added a comment - 16/Dec/08 03:19 PM
I can confirm that the alterations to my patch do indeed appear to work.

Submitting patch to Hudson.


zhangwei added a comment - 23/Dec/08 05:41 AM
I follow the wiki "HowToContribute" step by step and create the new patch.I hope it woks

zhangwei added a comment - 23/Dec/08 05:50 AM
the patch's content use chinese chars: 工作副本 which means "revision"
I translate into english.

Steve Loughran added a comment - 27/Jan/09 02:50 PM
  1. does this break ganglia 3.0?
  2. The code works out its hostname in a static constructor
+        InetAddress localMachine = InetAddress.getLocalHost();
+        hostName = localMachine.getHostName();

For 0.20 and onwards, this patch should use the code in org.apache.hadoop.net.DNS, rather than try and do this itself. The DNS class has caching and fallback for machines that don't know who they are.


Brian Bockelman added a comment - 19/Mar/09 10:29 PM
Hey Steve,

1) This does not break Ganglia 3.0. It gives a subclass of the Ganglia 3.0 implementation that is compatible with 3.1.

2) I'll have to work on this - we're currently using the patch for 0.19, so I haven't looked at the DNS class yet. Can you give an example (or fix the patch yourself) ?


Brian Bockelman added a comment - 27/Mar/09 08:46 PM
I believe this works - Steve, can you look over my usage of the DNS class?

Brian Bockelman added a comment - 28/Mar/09 09:06 AM
Ok - I think the patch is ready - why is there no "Submit Patch" from JIRA? Did I do something wrong?

Brian Bockelman added a comment - 29/Mar/09 12:45 AM
Hm - found a bug in the v5 patch. This next one currently uses the DNS class correctly as Steve suggested.

This next one includes a test case for Ganglia – it's quite simple. All it does is configure the MiniDFSCluster to report to Ganglia31, and then starts up a separate thread which listens for metrics. It verifies that the hostname is properly set inside the packet sent to Ganglia, but does not verify that all the formatting is correct.

This appears to be the first test for Ganglia? Probably should have written it awhile ago.


Steve Loughran added a comment - 30/Mar/09 01:09 PM
Looks good to me, +1.

One thing to consider in future is moving the code to listen (and later validate) the ganglia 3.1 messages should be moved somewhere where it can be redistributed/reused in other tests and tools. That can wait until there is more validation.


Brian Bockelman added a comment - 09/Apr/09 04:10 PM
Ok, submitting version 6 to Hudson. Includes a test case and correct DNS usage. Go Hudson, go!

Hadoop QA added a comment - 09/Apr/09 10:30 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12404072/HADOOP-4675-v6.patch
against trunk revision 763728.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/174/console

This message is automatically generated.


Brian Bockelman added a comment - 05/May/09 02:22 AM
v7 of the patch - I think I've fixed the 'patch' issue in Hudson - just a bum development environment on my laptop.

Hadoop QA added a comment - 08/May/09 06:25 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12407207/HADOOP-4675-v7.patch
against trunk revision 772688.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

-1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

-1 release audit. The applied patch generated 476 release audit warnings (more than the trunk's current 475 warnings).

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/artifact/trunk/current/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/console

This message is automatically generated.


Hadoop QA added a comment - 08/May/09 10:25 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12407207/HADOOP-4675-v7.patch
against trunk revision 772844.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

-1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/299/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/299/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/299/console

This message is automatically generated.


Johan Oskarsson added a comment - 28/May/09 03:27 PM

Scott Beardsley added a comment - 19/Aug/09 06:46 AM
add support for Ganglia 3.1 metrics

Scott Beardsley added a comment - 19/Aug/09 06:49 AM
These minor changes in this[1] patch should fix the findbugs warnings in previous versions. Tested against rev 805676 (current trunk)

[1] https://issues.apache.org/jira/secure/attachment/12416987/HADOOP-4675-v8.patch

--------
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.


Hadoop QA added a comment - 19/Aug/09 12:07 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12416987/HADOOP-4675-v8.patch
against trunk revision 804918.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/613/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/613/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/613/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/613/console

This message is automatically generated.


stack added a comment - 27/Aug/09 08:36 PM
I will commit this patch in next day or so unless objection.

George Porter added a comment - 02/Sep/09 08:27 PM
Any word on whether this is in the trunk now? Thanks!

Hudson added a comment - 02/Sep/09 10:09 PM
Integrated in Hadoop-Common-trunk-Commit #9 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/9/)
Current Ganglia metrics implementation is incompatible with Ganglia 3.1

Hudson added a comment - 02/Sep/09 10:19 PM
Integrated in Hadoop-Common-trunk-Commit #10 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/10/)
Current Ganglia metrics implementation is incompatible with Ganglia 3.1

stack added a comment - 02/Sep/09 10:22 PM
I applied the patch but then noticed that the unit test isn't being run; its got the wrong package on it and so applies outside of src/test/core. It then fails to compile because it uses minidfscluster it in the 'correct' location – under src/test/core. I reversed the patch until this is addressed.

Hudson added a comment - 02/Sep/09 10:29 PM
Integrated in Hadoop-Common-trunk-Commit #11 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/11/)
Current Ganglia metrics implementation is incompatible with Ganglia 3.1 – reversing this patch

stack added a comment - 04/Sep/09 12:17 AM
If someone can fix the patch so the ganglia test is applied to src/test/core and not to src/test I will apply (Will have to do another spin through hudson....). This patch has been working well for me for months.

Matt Massie added a comment - 04/Sep/09 12:40 AM
Trivial change to drop TestGangliaContext31 into src/test/core instead of src/test per stack's request.

Scott Beardsley added a comment - 04/Sep/09 01:24 AM
Thanks Matt, you beat me to it.

stack added a comment - 04/Sep/09 04:16 AM
Thanks lads.

@Matt: Does it compile for you? I get this when I apply to current common TRUNK:

[javac] Compiling 126 source files to /Users/stack/checkouts/common/trunk/build/test/core/classes
    [javac] /Users/stack/checkouts/common/trunk/src/test/core/org/apache/hadoop/metrics/TestGangliaContext31.java:31: package org.apache.hadoop.hdfs does not exist
    [javac] import org.apache.hadoop.hdfs.MiniDFSCluster;
    [javac]                              ^

Here's my version:

URL: https://svn.apache.org/repos/asf/hadoop/common/trunk
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 811214
...

Common doesn't bundle hdfs. I'm doing something wrong or the test no longer suits after the refactoring of hadoop into different modules.

If no minidfscluster available in common, then the test doesn't fit the new constrained environment (though the test is creative with its setup of a listener to catch and verify the ganglia emissions). I'd be grand w/ removing the test. What you all think?


Matt Massie added a comment - 04/Sep/09 07:02 AM
@stack: the patch compiles if you run, e.g...
CLASSPATH="../hadoop-hdfs/build/hadoop-hdfs-0.21.0-dev.jar:../hadoop-hdfs/build/hadoop-hdfs-test-0.21.0-dev.jar" ant bin-package

You're correct that MiniDFSCluster is not a part of hadoop-common but rather hadoop-hdfs.

In the future, I will run something like the following to test patches since I think it closely mimics Hudson, e.g.

ant test-patch -Dpatch.file=HADOOP-4675-v9.patch  -Dfindbugs.home=... -Djava5.home=... -Dforrest.home=...

where none of the non-hadoop-common jars are in the classpath.

I read through your test and agree: it is creative. It's a shame that MiniDFSCluster isn't available in common. Maybe the test should be commented out until Mini clusters are available for common tests? It's not ideal but having this patch linger on without resolution isn't ideal either. The code in the patch shouldn't be prone to bugs since it's simply sending specially crafted UDP messages. I trust that Java sockets have been thoroughly tested and the xdr_ functions are very straight-forward (e.g. xdr_int is basically htonl()).

Btw, your approach to sending the ganglia meta data each time is practical. It can be optimized in the future but that might add unwelcome complexity.


Tom White added a comment - 04/Sep/09 04:05 PM
There are a couple of options I can see.

1. Convert the test to be more of a unit test of GangliaContext31. So rather than use MiniDFSCluster, inject metrics directly into the context and check that they are correctly received by the socket listener.
2. Move the test to the HDFS project as an integration test.

Ideally we'd do both, since 1 is a unit test for Ganglia, and 2 tests the integration of metrics and HDFS.


Brian Bockelman added a comment - 16/Sep/09 02:39 AM
Hey Tom,

I'm a bit confused on what to do next. Are you suggesting that we mark this as WONT FIX and re-open for HDFS? It's been about 9 months since I submitted the patch, so I'm mentally a bit behind.

Brian


Tom White added a comment - 16/Sep/09 02:58 PM

Are you suggesting that we mark this as WONT FIX and re-open for HDFS?

No, I think writing a unit test for GangliaContext31 (that doesn't use MiniDFSCluster) is needed to commit this. At a later point we could then follow up with a separate HDFS issue that has the test using MiniDFSCluster.