Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-412

Hadoop JMX usage makes Nagios monitoring impossible

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.21.0
    • None
    • None
    • Reviewed

    Description

      When Hadoop reports Datanode information to JMX, the bean uses the name "DataNode-" + storageid. The storage ID incorporates a random number and is unpredictable.

      This prevents me from monitoring DFS datanodes through Hadoop using the JMX interface; in order to do that, you must be able to specify the bean name on the command line.

      The fix is simple, patch will be coming momentarily. However, there was probably a reason for making the datanodes all unique names which I'm unaware of, so it'd be nice to hear from the metrics maintainer.

      Attachments

        1. hdfs-412.patch
          3 kB
          Thomas White
        2. hadoop-4482.patch
          3 kB
          Thomas White
        3. jmx_name_replaced.patch
          0.7 kB
          Brian Bockelman
        4. jmx_name.patch
          0.7 kB
          Brian Bockelman

        Issue Links

          Activity

            Make the bean name non-unique for the datanode.

            bockelman Brian Bockelman added a comment - Make the bean name non-unique for the datanode.

            Maybe you could use the machinename + htttport to uniquely identify a datanode. This will dentify each datanode uniquely in JMXland while at the same time be a constant for each datanode.

            dhruba Dhruba Borthakur added a comment - Maybe you could use the machinename + htttport to uniquely identify a datanode. This will dentify each datanode uniquely in JMXland while at the same time be a constant for each datanode.

            Hey Dhruba,

            Great idea! The next attached file replaces the previous one.

            Brian

            bockelman Brian Bockelman added a comment - Hey Dhruba, Great idea! The next attached file replaces the previous one. Brian

            +1. Code looks good.

            dhruba Dhruba Borthakur added a comment - +1. Code looks good.
            omalley Owen O'Malley added a comment -

            I just committed this. Thanks, Brian!

            omalley Owen O'Malley added a comment - I just committed this. Thanks, Brian!

            This probably has been committed only to trunk but the JIRA say "fixed for 0.19.1".

            dhruba Dhruba Borthakur added a comment - This probably has been committed only to trunk but the JIRA say "fixed for 0.19.1".
            szetszwo Tsz-wo Sze added a comment -

            The patch does not work well, see HADOOP-4520.

            szetszwo Tsz-wo Sze added a comment - The patch does not work well, see HADOOP-4520 .
            hudson Hudson added a comment -

            Integrated in Hadoop-trunk #642 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/642/)
            . Make the JMX monitoring use predictable names for the
            datanodes to enable Nagios monitoring. (Brian Bockelman via omalley)

            hudson Hudson added a comment - Integrated in Hadoop-trunk #642 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/642/ ) . Make the JMX monitoring use predictable names for the datanodes to enable Nagios monitoring. (Brian Bockelman via omalley)
            omalley Owen O'Malley added a comment -

            I've reverted this until a correct fix is available.

            omalley Owen O'Malley added a comment - I've reverted this until a correct fix is available.

            Hey Owen:

            Is there a documentation page explaining how to contribute unittests to Hadoop? I'd like to write a test for issues like this (and a few other related metrics ones - lots of things are broken right now), but I'm not sure of the best place to start.

            Brian

            bockelman Brian Bockelman added a comment - Hey Owen: Is there a documentation page explaining how to contribute unittests to Hadoop? I'd like to write a test for issues like this (and a few other related metrics ones - lots of things are broken right now), but I'm not sure of the best place to start. Brian

            There is a section inside http://wiki.apache.org/hadoop/HowToContribute that describes unit tests.

            dhruba Dhruba Borthakur added a comment - There is a section inside http://wiki.apache.org/hadoop/HowToContribute that describes unit tests.
            hudson Hudson added a comment -

            Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/)
            Revert while it is being fixed.

            hudson Hudson added a comment - Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/ ) Revert while it is being fixed.
            tomwhite Thomas White added a comment -

            The problem with the previous patch was that the JMX object name had an illegal ':' character in it, and this error was being masked by MBeanUtil which logs exceptions, but doesn't throw them. This patch replaces the ':' with a '-', and I verified that TestDataNodeMetrics doesn't log an exception. I've also opened HADOOP-5237 to address the exception logging (and handling).

            tomwhite Thomas White added a comment - The problem with the previous patch was that the JMX object name had an illegal ':' character in it, and this error was being masked by MBeanUtil which logs exceptions, but doesn't throw them. This patch replaces the ':' with a '-', and I verified that TestDataNodeMetrics doesn't log an exception. I've also opened HADOOP-5237 to address the exception logging (and handling).
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12400111/hadoop-4482.patch
            against trunk revision 744000.

            +1 @author. The patch does not contain any @author tags.

            -1 tests included. The patch doesn't appear to include any new or modified tests.
            Please justify why no tests are needed for this patch.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 findbugs. The patch does not introduce any new Findbugs warnings.

            +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            +1 core tests. The patch passed core unit tests.

            +1 contrib tests. The patch passed contrib unit tests.

            Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/testReport/
            Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
            Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/checkstyle-errors.html
            Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12400111/hadoop-4482.patch against trunk revision 744000. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/console This message is automatically generated.
            tomwhite Thomas White added a comment -

            The change is tested by existing unit tests so it doesn't new tests (but see also HADOOP-5237).

            tomwhite Thomas White added a comment - The change is tested by existing unit tests so it doesn't new tests (but see also HADOOP-5237 ).
            tomwhite Thomas White added a comment -

            Regenerated following project split.

            tomwhite Thomas White added a comment - Regenerated following project split.
            tomwhite Thomas White added a comment -

            Re-submitting to Hudson.

            tomwhite Thomas White added a comment - Re-submitting to Hudson.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12411401/hdfs-412.patch
            against trunk revision 811493.

            +1 @author. The patch does not contain any @author tags.

            -1 tests included. The patch doesn't appear to include any new or modified tests.
            Please justify why no new tests are needed for this patch.
            Also please list what manual steps were performed to verify this patch.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 findbugs. The patch does not introduce any new Findbugs warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            -1 core tests. The patch failed core unit tests.

            +1 contrib tests. The patch passed contrib unit tests.

            Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/testReport/
            Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
            Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/checkstyle-errors.html
            Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12411401/hdfs-412.patch against trunk revision 811493. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/console This message is automatically generated.
            tomwhite Thomas White added a comment -

            I've just committed this. Thanks Brian!

            tomwhite Thomas White added a comment - I've just committed this. Thanks Brian!
            hudson Hudson added a comment -

            Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/)
            . Hadoop JMX usage makes Nagios monitoring impossible. Contributed by Brian Bockelman.

            hudson Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/ ) . Hadoop JMX usage makes Nagios monitoring impossible. Contributed by Brian Bockelman.
            hudson Hudson added a comment -

            Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/)

            hudson Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/ )

            People

              bockelman Brian Bockelman
              bockelman Brian Bockelman
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: