Issue Details (XML | Word | Printable)

Key: HADOOP-4675
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Brian Bockelman
Reporter: Brian Bockelman
Votes: 3
Watchers: 12
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Current Ganglia metrics implementation is incompatible with Ganglia 3.1

Created: 18/Nov/08 04:41 AM   Updated: 16/Sep/09 02:58 PM
Return to search
Component/s: metrics
Affects Version/s: 0.21.0
Fix Version/s: 0.21.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works hadoop-4675-2.patch 2008-11-25 02:14 AM Brian Bockelman 14 kB
Text File Licensed for inclusion in ASF works hadoop-4675-3.patch 2008-11-25 12:12 PM zhangwei 10 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-4.patch 2008-12-23 05:50 AM zhangwei 9 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v5.patch 2009-03-27 08:46 PM Brian Bockelman 14 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v6.patch 2009-03-29 12:45 AM Brian Bockelman 20 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v7.patch 2009-05-05 02:22 AM Brian Bockelman 16 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v8.patch 2009-08-19 06:46 AM Scott Beardsley 15 kB
Text File Licensed for inclusion in ASF works HADOOP-4675-v9.patch 2009-09-04 12:40 AM Matt Massie 15 kB
Text File Licensed for inclusion in ASF works hadoop-4675.patch 2008-11-24 02:39 PM Brian Bockelman 14 kB
Issue Links:
Dependants
 

Release Note: Support for reporting metrics to Ganglia 3.1 servers
Tags: ganglia metrics


 Description  « Hide
Ganglia changed its wire protocol in the 3.1.x series; the current implementation only works for 3.0.x.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Brian Bockelman added a comment - 18/Nov/08 04:45 AM
Looking at the new Ganglia wire protocol, we'll probably need to do the following:
1) Implement more XDR serializations. 3.0.x protocol sends all data as strings; 3.1.x correctly sends float data as XDR floats.
2) Test and verify sending Longs as floats in Ganglia actually works. This has been discussed, but I'm not sure it's entirely functional... maybe get a test case going?
3) Probably this will result in a new GangliaContext31 class, as I don't see any clean way to merge the two protocols together into one class.
  • The patch for HADOOP-3422 caused a new metric name to be generated in the emitRecord class; we could reuse this if it was separated out into a new function.

Brian


Brian Bockelman added a comment - 24/Nov/08 02:39 PM
The attached patch (which includes the patch for HADOOP-3422 in order to make things work...) should provide Ganglia 3.1 support.

Note that this patch probably will be conflicting in Hudson until HADOOP-3422 lands in trunk. I'll rebase things then.


Brian Bockelman made changes - 24/Nov/08 02:39 PM
Field Original Value New Value
Attachment hadoop-4675.patch [ 12394559 ]
Brian Bockelman made changes - 24/Nov/08 02:39 PM
Link This issue depends on HADOOP-3422 [ HADOOP-3422 ]
Brian Bockelman added a comment - 25/Nov/08 02:14 AM
Fix a build issue.

Brian Bockelman made changes - 25/Nov/08 02:14 AM
Attachment hadoop-4675-2.patch [ 12394626 ]
zhangwei added a comment - 25/Nov/08 10:24 AM
I test the patch OK.And I suggest to add the group attribute in the extra data
the metric in Ganglia3.1.* has the attribute,so i think to put the prefix of the metric name to the group name.
for example:
metric :dfs.namenode.FilesCreated --> group: dfs.namenode

zhangwei added a comment - 25/Nov/08 11:23 AM - edited
hadoop-4675-3.patch :add group attribute which use the prefix of the metric name.

zhangwei made changes - 25/Nov/08 11:23 AM
Attachment hadoop-4675-3.patch [ 12394644 ]
zhangwei made changes - 25/Nov/08 12:11 PM
Attachment hadoop-4675-3.patch [ 12394644 ]
zhangwei made changes - 25/Nov/08 12:12 PM
Attachment hadoop-4675-3.patch [ 12394647 ]
Brian Bockelman added a comment - 25/Nov/08 03:57 PM
Hey zhangwei,

The group attribute stuff is a good improvement. Have you tested it on your cluster? I'll perhaps get to do this today or tomorrow; we're going to be upgrading to Hadoop 0.19.0 soon, which will allow me to slip a few new patches in.

Brian


zhangwei added a comment - 26/Nov/08 02:15 AM
Hi, Brian

I tested it on my cluster,which has ganglia 3.1.1 installed.I add a little code simply based on your patch:

+ String GroupName = name.substring(0,name.lastIndexOf("."))
. . . . . .
xdr_int(dmax); // dmax, the maximum data value
//xdr_string(""); // Empty extra_value field for Ganglia 3.1.
+ xdr_int(1); //this means Num Of the Element follow by
+ xdr_string("GROUP");
+ xdr_string(GroupName);
for (SocketAddress socketAddress : metricsServers) {
DatagramPacket packet

And it seems OK.


Brian Bockelman added a comment - 16/Dec/08 03:19 PM
I can confirm that the alterations to my patch do indeed appear to work.

Submitting patch to Hudson.


Brian Bockelman made changes - 16/Dec/08 03:19 PM
Fix Version/s 0.20.0 [ 12313438 ]
Fix Version/s 0.19.1 [ 12313473 ]
Status Open [ 1 ] Patch Available [ 10002 ]
zhangwei added a comment - 23/Dec/08 05:41 AM
I follow the wiki "HowToContribute" step by step and create the new patch.I hope it woks

zhangwei made changes - 23/Dec/08 05:41 AM
Attachment HADOOP-4675-4.patch [ 12396653 ]
zhangwei made changes - 23/Dec/08 05:48 AM
Attachment HADOOP-4675-4.patch [ 12396653 ]
zhangwei added a comment - 23/Dec/08 05:50 AM
the patch's content use chinese chars: 工作副本 which means "revision"
I translate into english.

zhangwei made changes - 23/Dec/08 05:50 AM
Attachment HADOOP-4675-4.patch [ 12396654 ]
Steve Loughran added a comment - 27/Jan/09 02:50 PM
  1. does this break ganglia 3.0?
  2. The code works out its hostname in a static constructor
+        InetAddress localMachine = InetAddress.getLocalHost();
+        hostName = localMachine.getHostName();

For 0.20 and onwards, this patch should use the code in org.apache.hadoop.net.DNS, rather than try and do this itself. The DNS class has caching and fallback for machines that don't know who they are.


Jeff Hammerbacher made changes - 27/Jan/09 04:13 PM
Comment [ I'll be out of the country starting Friday, 1/23, returning Monday,
2/2. I will not be reachable at all while I am gone. Please contact
Jeff Hammerbacher (hammer@cloudera.com) or Philip Zeyliger
(philip@cloudera.com) in my leave.

Thanks! I will get back to you in February.

Alex
]
Nigel Daley made changes - 24/Feb/09 11:30 PM
Fix Version/s 0.19.1 [ 12313473 ]
Fix Version/s 0.19.2 [ 12313650 ]
Brian Bockelman added a comment - 19/Mar/09 10:29 PM
Hey Steve,

1) This does not break Ganglia 3.0. It gives a subclass of the Ganglia 3.0 implementation that is compatible with 3.1.

2) I'll have to work on this - we're currently using the patch for 0.19, so I haven't looked at the DNS class yet. Can you give an example (or fix the patch yourself) ?


Brian Bockelman added a comment - 27/Mar/09 08:46 PM
I believe this works - Steve, can you look over my usage of the DNS class?

Brian Bockelman made changes - 27/Mar/09 08:46 PM
Attachment HADOOP-4675-v5.patch [ 12403843 ]
Brian Bockelman made changes - 28/Mar/09 09:04 AM
Status Patch Available [ 10002 ] In Progress [ 3 ]
Brian Bockelman added a comment - 28/Mar/09 09:06 AM
Ok - I think the patch is ready - why is there no "Submit Patch" from JIRA? Did I do something wrong?

Brian Bockelman added a comment - 29/Mar/09 12:45 AM
Hm - found a bug in the v5 patch. This next one currently uses the DNS class correctly as Steve suggested.

This next one includes a test case for Ganglia – it's quite simple. All it does is configure the MiniDFSCluster to report to Ganglia31, and then starts up a separate thread which listens for metrics. It verifies that the hostname is properly set inside the packet sent to Ganglia, but does not verify that all the formatting is correct.

This appears to be the first test for Ganglia? Probably should have written it awhile ago.


Brian Bockelman made changes - 29/Mar/09 12:45 AM
Attachment HADOOP-4675-v6.patch [ 12404072 ]
Steve Loughran added a comment - 30/Mar/09 01:09 PM
Looks good to me, +1.

One thing to consider in future is moving the code to listen (and later validate) the ganglia 3.1 messages should be moved somewhere where it can be redistributed/reused in other tests and tools. That can wait until there is more validation.


Nigel Daley made changes - 09/Apr/09 04:05 PM
Resolution Incomplete [ 4 ]
Status In Progress [ 3 ] Resolved [ 5 ]
Nigel Daley made changes - 09/Apr/09 04:05 PM
Assignee Brian Bockelman [ bockelman ]
Status Resolved [ 5 ] Reopened [ 4 ]
Resolution Incomplete [ 4 ]
Brian Bockelman added a comment - 09/Apr/09 04:10 PM
Ok, submitting version 6 to Hudson. Includes a test case and correct DNS usage. Go Hudson, go!

Brian Bockelman made changes - 09/Apr/09 04:10 PM
Status Reopened [ 4 ] Patch Available [ 10002 ]
Hadoop QA added a comment - 09/Apr/09 10:30 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12404072/HADOOP-4675-v6.patch
against trunk revision 763728.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/174/console

This message is automatically generated.


Nigel Daley made changes - 23/Apr/09 07:07 PM
Fix Version/s 0.20.0 [ 12313438 ]
Brian Bockelman added a comment - 05/May/09 02:22 AM
v7 of the patch - I think I've fixed the 'patch' issue in Hudson - just a bum development environment on my laptop.

Brian Bockelman made changes - 05/May/09 02:22 AM
Attachment HADOOP-4675-v7.patch [ 12407207 ]
Brian Bockelman made changes - 05/May/09 02:22 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Brian Bockelman made changes - 05/May/09 02:23 AM
Fix Version/s 0.20.1 [ 12313866 ]
Status Open [ 1 ] Patch Available [ 10002 ]
Hadoop QA added a comment - 08/May/09 06:25 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12407207/HADOOP-4675-v7.patch
against trunk revision 772688.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

-1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

-1 release audit. The applied patch generated 476 release audit warnings (more than the trunk's current 475 warnings).

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/artifact/trunk/current/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/296/console

This message is automatically generated.


Hadoop QA added a comment - 08/May/09 10:25 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12407207/HADOOP-4675-v7.patch
against trunk revision 772844.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

-1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/299/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/299/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/299/console

This message is automatically generated.


Johan Oskarsson added a comment - 28/May/09 03:27 PM

Johan Oskarsson made changes - 28/May/09 03:27 PM
Fix Version/s 0.20.1 [ 12313866 ]
Fix Version/s 0.19.2 [ 12313650 ]
Fix Version/s 0.21.0 [ 12313563 ]
Status Patch Available [ 10002 ] Open [ 1 ]
Repository Revision Date User Message
ASF #788837 Fri Jun 26 19:49:08 UTC 2009 stack HBASE-1587 Update ganglia config and doc to account for ganglia 3.1 and HADOOP-4675
Files Changed
MODIFY /hadoop/hbase/trunk/conf/hadoop-metrics.properties
MODIFY /hadoop/hbase/trunk/CHANGES.txt

Scott Beardsley added a comment - 19/Aug/09 06:46 AM
add support for Ganglia 3.1 metrics

Scott Beardsley made changes - 19/Aug/09 06:46 AM
Attachment HADOOP-4675-v8.patch [ 12416987 ]
Scott Beardsley added a comment - 19/Aug/09 06:49 AM
These minor changes in this[1] patch should fix the findbugs warnings in previous versions. Tested against rev 805676 (current trunk)

[1] https://issues.apache.org/jira/secure/attachment/12416987/HADOOP-4675-v8.patch

--------
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.


Scott Beardsley made changes - 19/Aug/09 06:49 AM
Tags ganglia metrics
Affects Version/s 0.18.2 [ 12313424 ]
Affects Version/s 0.21.0 [ 12313563 ]
Release Note Support for reporting metrics to Ganglia 3.1 servers
Status Open [ 1 ] Patch Available [ 10002 ]
Hadoop QA added a comment - 19/Aug/09 12:07 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12416987/HADOOP-4675-v8.patch
against trunk revision 804918.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/613/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/613/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/613/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/613/console

This message is automatically generated.


stack added a comment - 27/Aug/09 08:36 PM
I will commit this patch in next day or so unless objection.

George Porter added a comment - 02/Sep/09 08:27 PM
Any word on whether this is in the trunk now? Thanks!

Repository Revision Date User Message
ASF #810709 Wed Sep 02 22:01:34 UTC 2009 stack HADOOP-4675 Current Ganglia metrics implementation is incompatible with Ganglia 3.1
Files Changed
MODIFY /hadoop/common/trunk/CHANGES.txt
MODIFY /hadoop/common/trunk/src/java/org/apache/hadoop/metrics/ganglia/GangliaContext.java
MODIFY /hadoop/common/trunk/conf/hadoop-metrics.properties

Repository Revision Date User Message
ASF #810710 Wed Sep 02 22:02:04 UTC 2009 stack HADOOP-4675 Current Ganglia metrics implementation is incompatible with Ganglia 3.1
Files Changed
ADD /hadoop/common/trunk/src/java/org/apache/hadoop/metrics/ganglia/GangliaContext31.java

Hudson added a comment - 02/Sep/09 10:09 PM
Integrated in Hadoop-Common-trunk-Commit #9 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/9/)
Current Ganglia metrics implementation is incompatible with Ganglia 3.1

Hudson added a comment - 02/Sep/09 10:19 PM
Integrated in Hadoop-Common-trunk-Commit #10 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/10/)
Current Ganglia metrics implementation is incompatible with Ganglia 3.1

Repository Revision Date User Message
ASF #810714 Wed Sep 02 22:19:42 UTC 2009 stack HADOOP-4675 Current Ganglia metrics implementation is incompatible with Ganglia 3.1 -- reversing this patch
Files Changed
MODIFY /hadoop/common/trunk/CHANGES.txt
MODIFY /hadoop/common/trunk/src/java/org/apache/hadoop/metrics/ganglia/GangliaContext.java
DEL /hadoop/common/trunk/src/java/org/apache/hadoop/metrics/ganglia/GangliaContext31.java
MODIFY /hadoop/common/trunk/conf/hadoop-metrics.properties

stack added a comment - 02/Sep/09 10:22 PM
I applied the patch but then noticed that the unit test isn't being run; its got the wrong package on it and so applies outside of src/test/core. It then fails to compile because it uses minidfscluster it in the 'correct' location – under src/test/core. I reversed the patch until this is addressed.

Hudson added a comment - 02/Sep/09 10:29 PM
Integrated in Hadoop-Common-trunk-Commit #11 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/11/)
Current Ganglia metrics implementation is incompatible with Ganglia 3.1 – reversing this patch

stack added a comment - 04/Sep/09 12:17 AM
If someone can fix the patch so the ganglia test is applied to src/test/core and not to src/test I will apply (Will have to do another spin through hudson....). This patch has been working well for me for months.

Matt Massie added a comment - 04/Sep/09 12:40 AM
Trivial change to drop TestGangliaContext31 into src/test/core instead of src/test per stack's request.

Matt Massie made changes - 04/Sep/09 12:40 AM
Attachment HADOOP-4675-v9.patch [ 12418570 ]
Scott Beardsley added a comment - 04/Sep/09 01:24 AM
Thanks Matt, you beat me to it.

stack added a comment - 04/Sep/09 04:16 AM
Thanks lads.

@Matt: Does it compile for you? I get this when I apply to current common TRUNK:

[javac] Compiling 126 source files to /Users/stack/checkouts/common/trunk/build/test/core/classes
    [javac] /Users/stack/checkouts/common/trunk/src/test/core/org/apache/hadoop/metrics/TestGangliaContext31.java:31: package org.apache.hadoop.hdfs does not exist
    [javac] import org.apache.hadoop.hdfs.MiniDFSCluster;
    [javac]                              ^

Here's my version:

URL: https://svn.apache.org/repos/asf/hadoop/common/trunk
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 811214
...

Common doesn't bundle hdfs. I'm doing something wrong or the test no longer suits after the refactoring of hadoop into different modules.

If no minidfscluster available in common, then the test doesn't fit the new constrained environment (though the test is creative with its setup of a listener to catch and verify the ganglia emissions). I'd be grand w/ removing the test. What you all think?


Matt Massie added a comment - 04/Sep/09 07:02 AM
@stack: the patch compiles if you run, e.g...
CLASSPATH="../hadoop-hdfs/build/hadoop-hdfs-0.21.0-dev.jar:../hadoop-hdfs/build/hadoop-hdfs-test-0.21.0-dev.jar" ant bin-package

You're correct that MiniDFSCluster is not a part of hadoop-common but rather hadoop-hdfs.

In the future, I will run something like the following to test patches since I think it closely mimics Hudson, e.g.

ant test-patch -Dpatch.file=HADOOP-4675-v9.patch  -Dfindbugs.home=... -Djava5.home=... -Dforrest.home=...

where none of the non-hadoop-common jars are in the classpath.

I read through your test and agree: it is creative. It's a shame that MiniDFSCluster isn't available in common. Maybe the test should be commented out until Mini clusters are available for common tests? It's not ideal but having this patch linger on without resolution isn't ideal either. The code in the patch shouldn't be prone to bugs since it's simply sending specially crafted UDP messages. I trust that Java sockets have been thoroughly tested and the xdr_ functions are very straight-forward (e.g. xdr_int is basically htonl()).

Btw, your approach to sending the ganglia meta data each time is practical. It can be optimized in the future but that might add unwelcome complexity.


Tom White added a comment - 04/Sep/09 04:05 PM
There are a couple of options I can see.

1. Convert the test to be more of a unit test of GangliaContext31. So rather than use MiniDFSCluster, inject metrics directly into the context and check that they are correctly received by the socket listener.
2. Move the test to the HDFS project as an integration test.

Ideally we'd do both, since 1 is a unit test for Ganglia, and 2 tests the integration of metrics and HDFS.


Tom White made changes - 07/Sep/09 10:41 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Brian Bockelman added a comment - 16/Sep/09 02:39 AM
Hey Tom,

I'm a bit confused on what to do next. Are you suggesting that we mark this as WONT FIX and re-open for HDFS? It's been about 9 months since I submitted the patch, so I'm mentally a bit behind.

Brian


Tom White added a comment - 16/Sep/09 02:58 PM

Are you suggesting that we mark this as WONT FIX and re-open for HDFS?

No, I think writing a unit test for GangliaContext31 (that doesn't use MiniDFSCluster) is needed to commit this. At a later point we could then follow up with a separate HDFS issue that has the test using MiniDFSCluster.