Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-412

Hadoop JMX usage makes Nagios monitoring impossible

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When Hadoop reports Datanode information to JMX, the bean uses the name "DataNode-" + storageid. The storage ID incorporates a random number and is unpredictable.

      This prevents me from monitoring DFS datanodes through Hadoop using the JMX interface; in order to do that, you must be able to specify the bean name on the command line.

      The fix is simple, patch will be coming momentarily. However, there was probably a reason for making the datanodes all unique names which I'm unaware of, so it'd be nice to hear from the metrics maintainer.

      1. jmx_name.patch
        0.7 kB
        Brian Bockelman
      2. jmx_name_replaced.patch
        0.7 kB
        Brian Bockelman
      3. hdfs-412.patch
        3 kB
        Tom White
      4. hadoop-4482.patch
        3 kB
        Tom White

        Issue Links

          Activity

          Brian Bockelman created issue -
          Hide
          Brian Bockelman added a comment -

          Make the bean name non-unique for the datanode.

          Show
          Brian Bockelman added a comment - Make the bean name non-unique for the datanode.
          Brian Bockelman made changes -
          Field Original Value New Value
          Attachment jmx_name.patch [ 12392602 ]
          Hide
          dhruba borthakur added a comment -

          Maybe you could use the machinename + htttport to uniquely identify a datanode. This will dentify each datanode uniquely in JMXland while at the same time be a constant for each datanode.

          Show
          dhruba borthakur added a comment - Maybe you could use the machinename + htttport to uniquely identify a datanode. This will dentify each datanode uniquely in JMXland while at the same time be a constant for each datanode.
          Hide
          Brian Bockelman added a comment -

          Hey Dhruba,

          Great idea! The next attached file replaces the previous one.

          Brian

          Show
          Brian Bockelman added a comment - Hey Dhruba, Great idea! The next attached file replaces the previous one. Brian
          Brian Bockelman made changes -
          Attachment jmx_name_replaced.patch [ 12392618 ]
          Hide
          dhruba borthakur added a comment -

          +1. Code looks good.

          Show
          dhruba borthakur added a comment - +1. Code looks good.
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Owen O'Malley made changes -
          Assignee Brian Bockelman [ bockelman ]
          Hide
          Owen O'Malley added a comment -

          I just committed this. Thanks, Brian!

          Show
          Owen O'Malley added a comment - I just committed this. Thanks, Brian!
          Owen O'Malley made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          dhruba borthakur added a comment -

          This probably has been committed only to trunk but the JIRA say "fixed for 0.19.1".

          Show
          dhruba borthakur added a comment - This probably has been committed only to trunk but the JIRA say "fixed for 0.19.1".
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The patch does not work well, see HADOOP-4520.

          Show
          Tsz Wo Nicholas Sze added a comment - The patch does not work well, see HADOOP-4520 .
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #642 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/642/)
          . Make the JMX monitoring use predictable names for the
          datanodes to enable Nagios monitoring. (Brian Bockelman via omalley)

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #642 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/642/ ) . Make the JMX monitoring use predictable names for the datanodes to enable Nagios monitoring. (Brian Bockelman via omalley)
          Tsz Wo Nicholas Sze made changes -
          Link This issue is blocked by HADOOP-4520 [ HADOOP-4520 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is blocked by HADOOP-4520 [ HADOOP-4520 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HADOOP-4520 [ HADOOP-4520 ]
          Owen O'Malley made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Owen O'Malley added a comment -

          I've reverted this until a correct fix is available.

          Show
          Owen O'Malley added a comment - I've reverted this until a correct fix is available.
          Hide
          Brian Bockelman added a comment -

          Hey Owen:

          Is there a documentation page explaining how to contribute unittests to Hadoop? I'd like to write a test for issues like this (and a few other related metrics ones - lots of things are broken right now), but I'm not sure of the best place to start.

          Brian

          Show
          Brian Bockelman added a comment - Hey Owen: Is there a documentation page explaining how to contribute unittests to Hadoop? I'd like to write a test for issues like this (and a few other related metrics ones - lots of things are broken right now), but I'm not sure of the best place to start. Brian
          Hide
          dhruba borthakur added a comment -

          There is a section inside http://wiki.apache.org/hadoop/HowToContribute that describes unit tests.

          Show
          dhruba borthakur added a comment - There is a section inside http://wiki.apache.org/hadoop/HowToContribute that describes unit tests.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/)
          Revert while it is being fixed.

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/ ) Revert while it is being fixed.
          Nigel Daley made changes -
          Fix Version/s 0.20.0 [ 12313438 ]
          Hide
          Tom White added a comment -

          The problem with the previous patch was that the JMX object name had an illegal ':' character in it, and this error was being masked by MBeanUtil which logs exceptions, but doesn't throw them. This patch replaces the ':' with a '-', and I verified that TestDataNodeMetrics doesn't log an exception. I've also opened HADOOP-5237 to address the exception logging (and handling).

          Show
          Tom White added a comment - The problem with the previous patch was that the JMX object name had an illegal ':' character in it, and this error was being masked by MBeanUtil which logs exceptions, but doesn't throw them. This patch replaces the ':' with a '-', and I verified that TestDataNodeMetrics doesn't log an exception. I've also opened HADOOP-5237 to address the exception logging (and handling).
          Tom White made changes -
          Attachment hadoop-4482.patch [ 12400111 ]
          Tom White made changes -
          Hadoop Flags [Reviewed]
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12400111/hadoop-4482.patch
          against trunk revision 744000.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12400111/hadoop-4482.patch against trunk revision 744000. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3845/console This message is automatically generated.
          Hide
          Tom White added a comment -

          The change is tested by existing unit tests so it doesn't new tests (but see also HADOOP-5237).

          Show
          Tom White added a comment - The change is tested by existing unit tests so it doesn't new tests (but see also HADOOP-5237 ).
          Nigel Daley made changes -
          Fix Version/s 0.19.2 [ 12313650 ]
          Fix Version/s 0.19.1 [ 12313473 ]
          Nigel Daley made changes -
          Fix Version/s 0.20.0 [ 12313438 ]
          Tom White made changes -
          Project Hadoop Common [ 12310240 ] HDFS [ 12310942 ]
          Key HADOOP-4482 HDFS-412
          Affects Version/s 0.18.1 [ 12313357 ]
          Fix Version/s 0.19.2 [ 12313650 ]
          Component/s metrics [ 12310971 ]
          Hide
          Tom White added a comment -

          Regenerated following project split.

          Show
          Tom White added a comment - Regenerated following project split.
          Tom White made changes -
          Attachment hdfs-412.patch [ 12411401 ]
          Tom White made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Tom White added a comment -

          Re-submitting to Hudson.

          Show
          Tom White added a comment - Re-submitting to Hudson.
          Tom White made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.21.0 [ 12314046 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12411401/hdfs-412.patch
          against trunk revision 811493.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12411401/hdfs-412.patch against trunk revision 811493. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/14/console This message is automatically generated.
          Hide
          Tom White added a comment -

          I've just committed this. Thanks Brian!

          Show
          Tom White added a comment - I've just committed this. Thanks Brian!
          Tom White made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/)
          . Hadoop JMX usage makes Nagios monitoring impossible. Contributed by Brian Bockelman.

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/ ) . Hadoop JMX usage makes Nagios monitoring impossible. Contributed by Brian Bockelman.
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/ )
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Resolved Resolved Reopened Reopened
          4d 19h 50m 1 Owen O'Malley 29/Oct/08 18:38
          Reopened Reopened Patch Available Patch Available
          105d 21h 46m 1 Tom White 12/Feb/09 16:24
          Patch Available Patch Available Open Open
          206d 18h 19m 1 Tom White 07/Sep/09 11:44
          Open Open Patch Available Patch Available
          9h 6m 2 Tom White 07/Sep/09 11:44
          Patch Available Patch Available Resolved Resolved
          3d 19h 30m 2 Tom White 08/Sep/09 14:09
          Resolved Resolved Closed Closed
          350d 7h 38m 1 Tom White 24/Aug/10 21:48

            People

            • Assignee:
              Brian Bockelman
              Reporter:
              Brian Bockelman
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development