Hadoop Common
  1. Hadoop Common
  2. HADOOP-2398

Additional Instrumentation for NameNode, RPC Layer and JMX support

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None

      Description

      Additional Instrumentation is needed for name node and its rpc layer. Furthermore the instrumentation should be
      visible via JMX, Java's standard monitoring tool.

      1. metricsPatch6_5.patch
        55 kB
        Sanjay Radia
      2. metricsPatch6_4.patch
        55 kB
        Sanjay Radia
      3. metricsPatch6_3.txt
        55 kB
        Sanjay Radia
      4. metricsPatch6_2.txt
        54 kB
        Sanjay Radia
      5. metricsPatch6_1.txt
        54 kB
        Sanjay Radia
      6. ScreenShotRPCStats.png
        101 kB
        Sanjay Radia
      7. ScreenShotNameNodeStats.png
        104 kB
        Sanjay Radia
      8. metricsPatch6.txt
        51 kB
        Sanjay Radia

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-Nightly #366 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/366/ )
        Hide
        dhruba borthakur added a comment -

        I just committed this. Thanks Sanjay!

        Show
        dhruba borthakur added a comment - I just committed this. Thanks Sanjay!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12373077/metricsPatch6_5.patch
        against trunk revision r611760.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1582/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1582/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1582/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1582/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373077/metricsPatch6_5.patch against trunk revision r611760. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1582/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1582/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1582/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1582/console This message is automatically generated.
        Hide
        Sanjay Radia added a comment -

        submitting: metricsPatch6_5.patch

        Show
        Sanjay Radia added a comment - submitting: metricsPatch6_5.patch
        Hide
        Sanjay Radia added a comment -

        fixed findbugs warning

        Show
        Sanjay Radia added a comment - fixed findbugs warning
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12372639/metricsPatch6_4.patch
        against trunk revision r611264.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372639/metricsPatch6_4.patch against trunk revision r611264. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/console This message is automatically generated.
        Hide
        Sanjay Radia added a comment -

        Submitting: metricsPatch6_4.patch

        Show
        Sanjay Radia added a comment - Submitting: metricsPatch6_4.patch
        Hide
        Sanjay Radia added a comment -

        Patch against latest trunk

        Show
        Sanjay Radia added a comment - Patch against latest trunk
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12372541/metricsPatch6_3.txt
        against trunk revision .

        @author +1. The patch does not contain any @author tags.

        patch -1. The patch command could not apply the patch.

        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1474/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372541/metricsPatch6_3.txt against trunk revision . @author +1. The patch does not contain any @author tags. patch -1. The patch command could not apply the patch. Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1474/console This message is automatically generated.
        Hide
        Sanjay Radia added a comment -

        Submitting metricsPatch6_3.txt

        Show
        Sanjay Radia added a comment - Submitting metricsPatch6_3.txt
        Hide
        Sanjay Radia added a comment -

        Cancelling metricsPatch6_2.txt

        Show
        Sanjay Radia added a comment - Cancelling metricsPatch6_2.txt
        Hide
        Sanjay Radia added a comment -

        Fixed the javadoc findbugs and the contrib failures.

        Show
        Sanjay Radia added a comment - Fixed the javadoc findbugs and the contrib failures.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12372475/metricsPatch6_2.txt
        against trunk revision .

        @author -1. The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions.

        javadoc -1. The javadoc tool appears to have generated messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs -1. The patch appears to cause Findbugs to fail.

        core tests -1. The patch failed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1461/testReport/
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1461/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1461/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372475/metricsPatch6_2.txt against trunk revision . @author -1. The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions. javadoc -1. The javadoc tool appears to have generated messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1461/testReport/ Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1461/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1461/console This message is automatically generated.
        Hide
        Sanjay Radia added a comment -

        Submiting my latest patch: metricsPatch6_2.txt

        Show
        Sanjay Radia added a comment - Submiting my latest patch: metricsPatch6_2.txt
        Hide
        Sanjay Radia added a comment -

        Same patch against later trunk.

        Show
        Sanjay Radia added a comment - Same patch against later trunk.
        Hide
        Raghu Angadi added a comment -

        +1.

        You might want to move all the 'HADOOP_OPTS="-Dcom.sun.management.jmxremote $HADOOP_OPTS' into one if clause in bin/hadoop, which is functionally same. But this is more of a personal preference.

        Show
        Raghu Angadi added a comment - +1. You might want to move all the 'HADOOP_OPTS="-Dcom.sun.management.jmxremote $HADOOP_OPTS' into one if clause in bin/hadoop, which is functionally same. But this is more of a personal preference.
        Hide
        Sanjay Radia added a comment -

        Update patch to reflect Ragu's feedback.

        Show
        Sanjay Radia added a comment - Update patch to reflect Ragu's feedback.
        Hide
        Raghu Angadi added a comment - - edited

        A few comments :

        1. Remove LOG.debug() after inovke() in RPC.java
        2. rpcDiscardedOps is not treated/named consistently compared to other two variables in RpcMetrics and RpcMgt. Also it is not updated or reset.
        3. RpcMetrics.log should be called LOG.
        4. As you mentione in the conversation, right now RpcMgtMBean interfaces and the access methods in RpcMgt are quired by JMX. Once we move to "Dyncalic MBean" stuff most of it will go away. Same for NameNodeMgt and NameNodeMBean. registerMBean() could be in utils.
        5. createFile() etc in NameNodeMetrics don't need to be synchronized.

        Edit: Typos.

        Show
        Raghu Angadi added a comment - - edited A few comments : Remove LOG.debug() after inovke() in RPC.java rpcDiscardedOps is not treated/named consistently compared to other two variables in RpcMetrics and RpcMgt. Also it is not updated or reset. RpcMetrics.log should be called LOG. As you mentione in the conversation, right now RpcMgtMBean interfaces and the access methods in RpcMgt are quired by JMX. Once we move to "Dyncalic MBean" stuff most of it will go away. Same for NameNodeMgt and NameNodeMBean. registerMBean() could be in utils. createFile() etc in NameNodeMetrics don't need to be synchronized. Edit: Typos.
        Hide
        Sanjay Radia added a comment -

        My patch does not use any dynamic mbeans or report the metrics in a dynamic way by listing the declared metrics.
        This would be very useful to do but I ran out of time for the .16 release.

        I will file a new Jira to modify the metrics to use dynamic mbeans. This would make the task of adding new metrics much
        simpler. One would not have to update an mbean for new metrics and also the metrics would be published automatically to the
        metrics context.

        Show
        Sanjay Radia added a comment - My patch does not use any dynamic mbeans or report the metrics in a dynamic way by listing the declared metrics. This would be very useful to do but I ran out of time for the .16 release. I will file a new Jira to modify the metrics to use dynamic mbeans. This would make the task of adding new metrics much simpler. One would not have to update an mbean for new metrics and also the metrics would be published automatically to the metrics context.
        Hide
        Sanjay Radia added a comment -

        The JMX part merely reads the same stats that are given to Simon.
        JMX reads the stats only when a client calls the JMX MBeans' get method.
        Hence no additional overhead.

        Show
        Sanjay Radia added a comment - The JMX part merely reads the same stats that are given to Simon. JMX reads the stats only when a client calls the JMX MBeans' get method. Hence no additional overhead.
        Hide
        Raghu Angadi added a comment -

        Sanjay,

        I haven't looked at the changes yet but the features are pretty neat and very useful. Could you comment on any overhead? Is it same as Simon stats.. I think Simon stats involved a synchronized section.

        Show
        Raghu Angadi added a comment - Sanjay, I haven't looked at the changes yet but the features are pretty neat and very useful. Could you comment on any overhead? Is it same as Simon stats.. I think Simon stats involved a synchronized section.
        Hide
        Sanjay Radia added a comment -

        This screen shot shows Jconsole displaying the rpc statistics for the name node.

        Note the tree on the left: it shows that name node stats and rpc stats for the name node are available.

        Any of the values can graphed by clicking on the value. This screeenshot shows that
        I decided to graph the RPC's Max processing and queued times.

        Show
        Sanjay Radia added a comment - This screen shot shows Jconsole displaying the rpc statistics for the name node. Note the tree on the left: it shows that name node stats and rpc stats for the name node are available. Any of the values can graphed by clicking on the value. This screeenshot shows that I decided to graph the RPC's Max processing and queued times.
        Hide
        Sanjay Radia added a comment -

        This screen shot shows Jconsole displaying the name node statistics.

        Note the tree on the left: it shows that name node stats and rpc stats for the name node are available.

        Any of the values can graphed by clicking on the value. This screeenshot shows that
        I decided to graph the Journal Sync Average Time.

        Show
        Sanjay Radia added a comment - This screen shot shows Jconsole displaying the name node statistics. Note the tree on the left: it shows that name node stats and rpc stats for the name node are available. Any of the values can graphed by clicking on the value. This screeenshot shows that I decided to graph the Journal Sync Average Time.
        Hide
        Sanjay Radia added a comment -

        This patch add the following features

        + Additional instrumentation/metrics for

        • Name node
        • RPC Layer

        + Adds a utility layer for maintaining metrics (see hadoop.metrics.util.*)

        • This makes it easier to add new metrics (fewer lines of code to add for each metrics
        • Mains additional metrics for min and max for rate based metrics
        • Consistency in naming the suffix and prefix for metrics (e.g. Foo_num_ops, Foo_ave_time etc)
        • Provides an interface for monitoring systems like JMX to get read the metrics when one of JMX's
          clients reads the metrics (without out having an update
          thread that reads the metrics periodically)

        +Publishes all the Name Node instrumentation/metrics via JMX, Java's standard monitoring
        framework. This will allow one to connect JConsole to monitor the name node.
        Note the cost is minimal - the mbeans are registered - there is not additonal cost unless someone
        is actually viewing the metrics.

        See NameNodeMgtMBean and RpcMgtMBean for the full list of instrumentation available for the name node as part of this patch

        Show
        Sanjay Radia added a comment - This patch add the following features + Additional instrumentation/metrics for Name node RPC Layer + Adds a utility layer for maintaining metrics (see hadoop.metrics.util.*) This makes it easier to add new metrics (fewer lines of code to add for each metrics Mains additional metrics for min and max for rate based metrics Consistency in naming the suffix and prefix for metrics (e.g. Foo_num_ops, Foo_ave_time etc) Provides an interface for monitoring systems like JMX to get read the metrics when one of JMX's clients reads the metrics (without out having an update thread that reads the metrics periodically) +Publishes all the Name Node instrumentation/metrics via JMX, Java's standard monitoring framework. This will allow one to connect JConsole to monitor the name node. Note the cost is minimal - the mbeans are registered - there is not additonal cost unless someone is actually viewing the metrics. See NameNodeMgtMBean and RpcMgtMBean for the full list of instrumentation available for the name node as part of this patch

          People

          • Assignee:
            Sanjay Radia
            Reporter:
            Sanjay Radia
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development