When Hadoop reports Datanode information to JMX, the bean uses the name "DataNode-" + storageid. The storage ID incorporates a random number and is unpredictable.
This prevents me from monitoring DFS datanodes through Hadoop using the JMX interface; in order to do that, you must be able to specify the bean name on the command line.
The fix is simple, patch will be coming momentarily. However, there was probably a reason for making the datanodes all unique names which I'm unaware of, so it'd be nice to hear from the metrics maintainer.
Maybe you could use the machinename + htttport to uniquely identify a datanode. This will dentify each datanode uniquely in JMXland while at the same time be a constant for each datanode.
Dhruba Borthakur
added a comment - Maybe you could use the machinename + htttport to uniquely identify a datanode. This will dentify each datanode uniquely in JMXland while at the same time be a constant for each datanode.
Hudson
added a comment - Integrated in Hadoop-trunk #642 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/642/ )
. Make the JMX monitoring use predictable names for the
datanodes to enable Nagios monitoring. (Brian Bockelman via omalley)
Is there a documentation page explaining how to contribute unittests to Hadoop? I'd like to write a test for issues like this (and a few other related metrics ones - lots of things are broken right now), but I'm not sure of the best place to start.
Brian
Brian Bockelman
added a comment - Hey Owen:
Is there a documentation page explaining how to contribute unittests to Hadoop? I'd like to write a test for issues like this (and a few other related metrics ones - lots of things are broken right now), but I'm not sure of the best place to start.
Brian
Hudson
added a comment - Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/ )
Revert while it is being fixed.
The problem with the previous patch was that the JMX object name had an illegal ':' character in it, and this error was being masked by MBeanUtil which logs exceptions, but doesn't throw them. This patch replaces the ':' with a '-', and I verified that TestDataNodeMetrics doesn't log an exception. I've also opened HADOOP-5237 to address the exception logging (and handling).
Thomas White
added a comment - The problem with the previous patch was that the JMX object name had an illegal ':' character in it, and this error was being masked by MBeanUtil which logs exceptions, but doesn't throw them. This patch replaces the ':' with a '-', and I verified that TestDataNodeMetrics doesn't log an exception. I've also opened HADOOP-5237 to address the exception logging (and handling).
Hadoop QA
added a comment - -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12400111/hadoop-4482.patch
against trunk revision 744000.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 Eclipse classpath. The patch retains Eclipse classpath integrity.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/console
This message is automatically generated.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Hadoop QA
added a comment - -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12411401/hdfs-412.patch
against trunk revision 811493.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/console
This message is automatically generated.
Hudson
added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/ )
. Hadoop JMX usage makes Nagios monitoring impossible. Contributed by Brian Bockelman.
Hudson
added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/ )
People
Brian Bockelman
Brian Bockelman
Votes:
0Vote for this issue
Watchers:
6Start watching this issue
Dates
Created:
Updated:
Resolved:
{"report":{"fcp":4084.100000023842,"ttfb":417.4000000357628,"pageVisibility":"visible","entityId":12406950,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":16,"apdex":0,"journeyId":"73c05dbd-fefa-448a-a71f-23c7647240ff","navigationType":0,"readyForUser":4245.800000011921,"redirectCount":0,"resourceLoadedEnd":3155.300000011921,"resourceLoadedStart":422.10000002384186,"resourceTiming":[{"duration":20.600000023841858,"initiatorType":"link","name":"https://issues.apache.org/jira/s/b62489a2eaac59d9b8a093c1a51d034f-CDN/xd97tr/820010/13pdxe5/49fa3aa3d35a2cc689cbf274e66cc41a/_/download/contextbatch/css/_super/batch.css","startTime":422.10000002384186,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":422.10000002384186,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":442.7000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":21.599999964237213,"initiatorType":"link","name":"https://issues.apache.org/jira/s/56490edcf9d54e35149505f78cca6a47-CDN/xd97tr/820010/13pdxe5/72cb823bcc50211a60c1ebe830467cae/_/download/contextbatch/css/jira.browse.project,jira.view.issue,project.issue.navigator,atl.general,atl.global,jira.global,jira.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&richediton=true&slack-enabled=true","startTime":422.4000000357628,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":422.4000000357628,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":444,"responseStart":0,"secureConnectionStart":0},{"duration":69.19999998807907,"initiatorType":"script","name":"https://issues.apache.org/jira/s/5263129088916436ab9aeb2417075b3f-CDN/xd97tr/820010/13pdxe5/49fa3aa3d35a2cc689cbf274e66cc41a/_/download/contextbatch/js/_super/batch.js?locale=en-UK","startTime":422.60000002384186,"connectEnd":422.60000002384186,"connectStart":422.60000002384186,"domainLookupEnd":422.60000002384186,"domainLookupStart":422.60000002384186,"fetchStart":422.60000002384186,"redirectEnd":0,"redirectStart":0,"requestStart":422.60000002384186,"responseEnd":491.80000001192093,"responseStart":491.80000001192093,"secureConnectionStart":422.60000002384186},{"duration":2246.400000035763,"initiatorType":"script","name":"https://issues.apache.org/jira/s/611c208bd094adb71a6f4f3e7f6fff3d-CDN/xd97tr/820010/13pdxe5/72cb823bcc50211a60c1ebe830467cae/_/download/contextbatch/js/jira.browse.project,jira.view.issue,project.issue.navigator,atl.general,atl.global,jira.global,jira.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en-UK&richediton=true&slack-enabled=true","startTime":422.80000001192093,"connectEnd":422.80000001192093,"connectStart":422.80000001192093,"domainLookupEnd":422.80000001192093,"domainLookupStart":422.80000001192093,"fetchStart":422.80000001192093,"redirectEnd":0,"redirectStart":0,"requestStart":422.80000001192093,"responseEnd":2669.2000000476837,"responseStart":2669.2000000476837,"secureConnectionStart":422.80000001192093},{"duration":73.59999996423721,"initiatorType":"script","name":"https://issues.apache.org/jira/s/d41d8cd98f00b204e9800998ecf8427e-CDN/xd97tr/820010/13pdxe5/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":422.9000000357628,"connectEnd":422.9000000357628,"connectStart":422.9000000357628,"domainLookupEnd":422.9000000357628,"domainLookupStart":422.9000000357628,"fetchStart":422.9000000357628,"redirectEnd":0,"redirectStart":0,"requestStart":422.9000000357628,"responseEnd":496.5,"responseStart":496.5,"secureConnectionStart":422.9000000357628},{"duration":87.80000001192093,"initiatorType":"script","name":"https://issues.apache.org/jira/s/d41d8cd98f00b204e9800998ecf8427e-CDN/xd97tr/820010/13pdxe5/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":423,"connectEnd":423,"connectStart":423,"domainLookupEnd":423,"domainLookupStart":423,"fetchStart":423,"redirectEnd":0,"redirectStart":0,"requestStart":423,"responseEnd":510.80000001192093,"responseStart":510.80000001192093,"secureConnectionStart":423},{"duration":93.29999995231628,"initiatorType":"link","name":"https://issues.apache.org/jira/s/981f587853769311cda7c3b845131a06-CDN/xd97tr/820010/13pdxe5/cb5a5495a038c0744457f25821ba9ee8/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":423.2000000476837,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":423.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":516.5,"responseStart":0,"secureConnectionStart":0},{"duration":89.60000002384186,"initiatorType":"script","name":"https://issues.apache.org/jira/rest/api/1.0/shortcuts/820010/5840efff50357da9055d4714dc0713f/shortcuts.js?context=issuenavigation&context=issueaction","startTime":423.30000001192093,"connectEnd":423.30000001192093,"connectStart":423.30000001192093,"domainLookupEnd":423.30000001192093,"domainLookupStart":423.30000001192093,"fetchStart":423.30000001192093,"redirectEnd":0,"redirectStart":0,"requestStart":423.30000001192093,"responseEnd":512.9000000357628,"responseStart":512.9000000357628,"secureConnectionStart":423.30000001192093},{"duration":61.5,"initiatorType":"link","name":"https://issues.apache.org/jira/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/xd97tr/820010/13pdxe5/efa42a25652b26dfd802540c024826b3/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-jira.view.issue,-project.issue.navigator/batch.css?jira.create.linked.issue=true&richediton=true","startTime":455.10000002384186,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":455.10000002384186,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":516.6000000238419,"responseStart":0,"secureConnectionStart":0},{"duration":59.19999998807907,"initiatorType":"script","name":"https://issues.apache.org/jira/s/efa8931cd5ac13ed95c56ca8a1dc1967-CDN/xd97tr/820010/13pdxe5/efa42a25652b26dfd802540c024826b3/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-jira.view.issue,-project.issue.navigator/batch.js?jira.create.linked.issue=true&locale=en-UK&richediton=true","startTime":455.30000001192093,"connectEnd":455.30000001192093,"connectStart":455.30000001192093,"domainLookupEnd":455.30000001192093,"domainLookupStart":455.30000001192093,"fetchStart":455.30000001192093,"redirectEnd":0,"redirectStart":0,"requestStart":455.30000001192093,"responseEnd":514.5,"responseStart":514.5,"secureConnectionStart":455.30000001192093},{"duration":2194.399999976158,"initiatorType":"script","name":"https://issues.apache.org/jira/s/d41d8cd98f00b204e9800998ecf8427e-CDN/xd97tr/820010/13pdxe5/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":804.9000000357628,"connectEnd":804.9000000357628,"connectStart":804.9000000357628,"domainLookupEnd":804.9000000357628,"domainLookupStart":804.9000000357628,"fetchStart":804.9000000357628,"redirectEnd":0,"redirectStart":0,"requestStart":804.9000000357628,"responseEnd":2999.300000011921,"responseStart":2999.300000011921,"secureConnectionStart":804.9000000357628},{"duration":2350.199999988079,"initiatorType":"script","name":"https://issues.apache.org/jira/s/d41d8cd98f00b204e9800998ecf8427e-CDN/xd97tr/820010/13pdxe5/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":805.1000000238419,"connectEnd":805.1000000238419,"connectStart":805.1000000238419,"domainLookupEnd":805.1000000238419,"domainLookupStart":805.1000000238419,"fetchStart":805.1000000238419,"redirectEnd":0,"redirectStart":0,"requestStart":805.1000000238419,"responseEnd":3155.300000011921,"responseStart":3155.300000011921,"secureConnectionStart":805.1000000238419},{"duration":1860.5,"initiatorType":"xmlhttprequest","name":"https://issues.apache.org/jira/rest/webResources/1.0/resources","startTime":821.3000000119209,"connectEnd":821.3000000119209,"connectStart":821.3000000119209,"domainLookupEnd":821.3000000119209,"domainLookupStart":821.3000000119209,"fetchStart":821.3000000119209,"redirectEnd":0,"redirectStart":0,"requestStart":821.3000000119209,"responseEnd":2681.800000011921,"responseStart":2681.7000000476837,"secureConnectionStart":821.3000000119209},{"duration":1564.0999999642372,"initiatorType":"script","name":"https://issues.apache.org/jira/s/d41d8cd98f00b204e9800998ecf8427e-CDN/xd97tr/820010/13pdxe5/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-project.issue.navigator,-jira.view.issue,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&richediton=true&slack-enabled=true","startTime":2877.7000000476837,"connectEnd":2877.7000000476837,"connectStart":2877.7000000476837,"domainLookupEnd":2877.7000000476837,"domainLookupStart":2877.7000000476837,"fetchStart":2877.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":4326.400000035763,"responseEnd":4441.800000011921,"responseStart":4441,"secureConnectionStart":2877.7000000476837}],"fetchStart":0,"domainLookupStart":22,"domainLookupEnd":46,"connectStart":46,"connectEnd":283,"secureConnectionStart":170,"requestStart":283,"responseStart":417,"responseEnd":532,"domLoading":421,"domInteractive":4344,"domContentLoadedEventStart":4344,"domContentLoadedEventEnd":4442,"domComplete":4725,"loadEventStart":4725,"loadEventEnd":4729,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":4297.5},{"name":"bigPipe.sidebar-id.end","time":4298.700000047684},{"name":"bigPipe.activity-panel-pipe-id.start","time":4298.900000035763},{"name":"bigPipe.activity-panel-pipe-id.end","time":4303.300000011921},{"name":"activityTabFullyLoaded","time":4469}],"measures":[],"correlationId":"a793a8aede9c64","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":111,"dbReadsTimeInMs":4,"dbConnsTimeInMs":12,"applicationHash":"ace47f9899e9ee25d7157d59aa17ab06aee30d3d","experiments":[]}}
Make the bean name non-unique for the datanode.