Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-412

Hadoop JMX usage makes Nagios monitoring impossible

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When Hadoop reports Datanode information to JMX, the bean uses the name "DataNode-" + storageid. The storage ID incorporates a random number and is unpredictable.

      This prevents me from monitoring DFS datanodes through Hadoop using the JMX interface; in order to do that, you must be able to specify the bean name on the command line.

      The fix is simple, patch will be coming momentarily. However, there was probably a reason for making the datanodes all unique names which I'm unaware of, so it'd be nice to hear from the metrics maintainer.

      1. hdfs-412.patch
        3 kB
        Tom White
      2. hadoop-4482.patch
        3 kB
        Tom White
      3. jmx_name_replaced.patch
        0.7 kB
        Brian Bockelman
      4. jmx_name.patch
        0.7 kB
        Brian Bockelman

        Issue Links

          Activity

          Hide
          Brian Bockelman added a comment -

          Make the bean name non-unique for the datanode.

          Show
          Brian Bockelman added a comment - Make the bean name non-unique for the datanode.
          Hide
          dhruba borthakur added a comment -

          Maybe you could use the machinename + htttport to uniquely identify a datanode. This will dentify each datanode uniquely in JMXland while at the same time be a constant for each datanode.

          Show
          dhruba borthakur added a comment - Maybe you could use the machinename + htttport to uniquely identify a datanode. This will dentify each datanode uniquely in JMXland while at the same time be a constant for each datanode.
          Hide
          Brian Bockelman added a comment -

          Hey Dhruba,

          Great idea! The next attached file replaces the previous one.

          Brian

          Show
          Brian Bockelman added a comment - Hey Dhruba, Great idea! The next attached file replaces the previous one. Brian
          Hide
          dhruba borthakur added a comment -

          +1. Code looks good.

          Show
          dhruba borthakur added a comment - +1. Code looks good.
          Hide
          Owen O'Malley added a comment -

          I just committed this. Thanks, Brian!

          Show
          Owen O'Malley added a comment - I just committed this. Thanks, Brian!
          Hide
          dhruba borthakur added a comment -

          This probably has been committed only to trunk but the JIRA say "fixed for 0.19.1".

          Show
          dhruba borthakur added a comment - This probably has been committed only to trunk but the JIRA say "fixed for 0.19.1".
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The patch does not work well, see HADOOP-4520.

          Show
          Tsz Wo Nicholas Sze added a comment - The patch does not work well, see HADOOP-4520 .
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #642 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/642/)
          . Make the JMX monitoring use predictable names for the
          datanodes to enable Nagios monitoring. (Brian Bockelman via omalley)

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #642 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/642/ ) . Make the JMX monitoring use predictable names for the datanodes to enable Nagios monitoring. (Brian Bockelman via omalley)
          Hide
          Owen O'Malley added a comment -

          I've reverted this until a correct fix is available.

          Show
          Owen O'Malley added a comment - I've reverted this until a correct fix is available.
          Hide
          Brian Bockelman added a comment -

          Hey Owen:

          Is there a documentation page explaining how to contribute unittests to Hadoop? I'd like to write a test for issues like this (and a few other related metrics ones - lots of things are broken right now), but I'm not sure of the best place to start.

          Brian

          Show
          Brian Bockelman added a comment - Hey Owen: Is there a documentation page explaining how to contribute unittests to Hadoop? I'd like to write a test for issues like this (and a few other related metrics ones - lots of things are broken right now), but I'm not sure of the best place to start. Brian
          Hide
          dhruba borthakur added a comment -

          There is a section inside http://wiki.apache.org/hadoop/HowToContribute that describes unit tests.

          Show
          dhruba borthakur added a comment - There is a section inside http://wiki.apache.org/hadoop/HowToContribute that describes unit tests.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/)
          Revert while it is being fixed.

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/ ) Revert while it is being fixed.
          Hide
          Tom White added a comment -

          The problem with the previous patch was that the JMX object name had an illegal ':' character in it, and this error was being masked by MBeanUtil which logs exceptions, but doesn't throw them. This patch replaces the ':' with a '-', and I verified that TestDataNodeMetrics doesn't log an exception. I've also opened HADOOP-5237 to address the exception logging (and handling).

          Show
          Tom White added a comment - The problem with the previous patch was that the JMX object name had an illegal ':' character in it, and this error was being masked by MBeanUtil which logs exceptions, but doesn't throw them. This patch replaces the ':' with a '-', and I verified that TestDataNodeMetrics doesn't log an exception. I've also opened HADOOP-5237 to address the exception logging (and handling).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12400111/hadoop-4482.patch
          against trunk revision 744000.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12400111/hadoop-4482.patch against trunk revision 744000. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/console This message is automatically generated.
          Hide
          Tom White added a comment -

          The change is tested by existing unit tests so it doesn't new tests (but see also HADOOP-5237).

          Show
          Tom White added a comment - The change is tested by existing unit tests so it doesn't new tests (but see also HADOOP-5237 ).
          Hide
          Tom White added a comment -

          Regenerated following project split.

          Show
          Tom White added a comment - Regenerated following project split.
          Hide
          Tom White added a comment -

          Re-submitting to Hudson.

          Show
          Tom White added a comment - Re-submitting to Hudson.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12411401/hdfs-412.patch
          against trunk revision 811493.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12411401/hdfs-412.patch against trunk revision 811493. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/console This message is automatically generated.
          Hide
          Tom White added a comment -

          I've just committed this. Thanks Brian!

          Show
          Tom White added a comment - I've just committed this. Thanks Brian!
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/)
          . Hadoop JMX usage makes Nagios monitoring impossible. Contributed by Brian Bockelman.

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/ ) . Hadoop JMX usage makes Nagios monitoring impossible. Contributed by Brian Bockelman.
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/ )

            People

            • Assignee:
              Brian Bockelman
              Reporter:
              Brian Bockelman
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development