Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1408

Herriot NN and DN clients should vend statistics

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      herriot

      Description

      The HDFS web user interface serves useful information through dfshealth.jsp and dfsnodelist.jsp.

      The Herriot interface to the namenode and datanode (as implemented in NNClient and DNClient, respectively) would benefit from the addition of some way to channel this information. In the case of DNClient this can be an injected method that returns a DatanodeDescriptor relevant to the underlying datanode.

      There seems to be no analagous NamenodeDescriptor. It may be useful to add this as a facade to a visitor that aggregates values across the filesystem datanodes. These values are (from dfshealth JSP):

      Configured Capacity
      DFS Used
      Non DFS Used
      DFS Remaining
      DFS Used%
      DFS Remaining%
      Live Nodes
      Dead Nodes
      Decommissioning Nodes
      Number of Under-Replicated Blocks

      Attributes reflecting the web user interface header may also be useful such as When-Started, Version, When-Compiled, and Upgrade-Status.

      A NamenodeDescriptor would essentially "push down" the code in dfshealth web UI behind a more general abstraction. If it is objectionable to make this class available in HDFS, perhaps this could be packaged in a Herriot specific way.

      1. jmx.patch
        7 kB
        Konstantin Boudnik
      2. jmx.patch
        18 kB
        Konstantin Boudnik
      3. jmx.patch
        19 kB
        Konstantin Boudnik
      4. jmx.patch
        15 kB
        Konstantin Boudnik
      5. HDFS-1408.patch
        8 kB
        Konstantin Boudnik
      6. HDFS-1408.patch
        3 kB
        Konstantin Boudnik
      7. HDFS-1408.patch
        5 kB
        Konstantin Boudnik
      8. HDFS-1408.patch
        4 kB
        Konstantin Boudnik
      9. HADOOP-6927.y20s.patch
        8 kB
        Konstantin Boudnik
      10. add_JMXlisteners.patch
        6 kB
        Konstantin Boudnik

        Issue Links

          Activity

          Hide
          Al Thompson added a comment -

          Best to use HDFS JMX support for gathering this info.

          Show
          Al Thompson added a comment - Best to use HDFS JMX support for gathering this info.
          Hide
          Al Thompson added a comment -

          This work should be restricted to providing access DataNodeDescriptor in Herriot's DNClient interface.
          Certain metrics such as blocks-per-datanode are not yet supported in HDFS' JMX MBeans.

          Show
          Al Thompson added a comment - This work should be restricted to providing access DataNodeDescriptor in Herriot's DNClient interface. Certain metrics such as blocks-per-datanode are not yet supported in HDFS' JMX MBeans.
          Hide
          Konstantin Boudnik added a comment -

          DatanodeDescriptor is namesystem specific entity. Therefore it seems to be more appropriate to expose this information via an injected API on namenode side ((NNProtocol).

          If this new API is required to be added via DNClient protocol then its every invocation will cause a subsequent RPC call. Instead, we can have a method in NNProtocol which will expose datanodeMap object from where a node descriptor can be received if a datanode registration is known.

          Show
          Konstantin Boudnik added a comment - DatanodeDescriptor is namesystem specific entity. Therefore it seems to be more appropriate to expose this information via an injected API on namenode side ((NNProtocol). If this new API is required to be added via DNClient protocol then its every invocation will cause a subsequent RPC call. Instead, we can have a method in NNProtocol which will expose datanodeMap object from where a node descriptor can be received if a datanode registration is known.
          Hide
          Konstantin Boudnik added a comment -

          That seems to be an adequate patch for y20S. Trunk patch is to follow.

          Show
          Konstantin Boudnik added a comment - That seems to be an adequate patch for y20S. Trunk patch is to follow.
          Hide
          Konstantin Boudnik added a comment -

          This should belong to HDFS starting from 0.21.

          Show
          Konstantin Boudnik added a comment - This should belong to HDFS starting from 0.21.
          Hide
          Konstantin Boudnik added a comment -

          After looking into this some more and talking off-line with Hdfs people I am leaning to the conclusion that statistics should be pulled via JMX interface. There's not much use to have a parallel set of API to provide an information which is already available someplace else (i.e. JMX).

          Instead of adding new instrumentation I would like to provide a primitive set of wrappers to facilitate a test access to JMX info. Basically, something like this should suffice:

            public List<String> getAllValues() throws Exception;
            public String getValue(String key) throws Exception;
          

          Patch is to follow.

          Show
          Konstantin Boudnik added a comment - After looking into this some more and talking off-line with Hdfs people I am leaning to the conclusion that statistics should be pulled via JMX interface. There's not much use to have a parallel set of API to provide an information which is already available someplace else (i.e. JMX). Instead of adding new instrumentation I would like to provide a primitive set of wrappers to facilitate a test access to JMX info. Basically, something like this should suffice: public List<String> getAllValues() throws Exception; public String getValue(String key) throws Exception; Patch is to follow.
          Hide
          Konstantin Boudnik added a comment -

          Here's a preliminary and very rough JMX based patch. Please be advised, that this patch is for y20s thus it mixes up what needs to be going into Common and Hdfs. The actual patch for 0.22 will have three different parts for all components.

          Show
          Konstantin Boudnik added a comment - Here's a preliminary and very rough JMX based patch. Please be advised, that this patch is for y20s thus it mixes up what needs to be going into Common and Hdfs. The actual patch for 0.22 will have three different parts for all components.
          Hide
          Konstantin Boudnik added a comment -

          This patch has the implementation for JMX client in herriot.
          Unfortunately, current JMX beans implementation in Hadoop doesn't provide notification broadcasting functionality so attributes can't be updated dynamically.

          This patch is for y20-security and I'll split it for 0.22

          Show
          Konstantin Boudnik added a comment - This patch has the implementation for JMX client in herriot. Unfortunately, current JMX beans implementation in Hadoop doesn't provide notification broadcasting functionality so attributes can't be updated dynamically. This patch is for y20-security and I'll split it for 0.22
          Hide
          Sharad Agarwal added a comment -

          current JMX beans implementation in Hadoop doesn't provide notification broadcasting functionality so attributes

          I think it is not a good idea to have listener based attribute updates as it looks to me a overkill and may lead to lot of remote traffic. Alternatively, tests can ask/poll for attribute values which anyway they have to do using the get API.

          Comments on patch:
          -jmxAttributes cache in AbstractDaemonClient shouldn't be static, no ?
          -at some places exceptions are caught and not thrown with proper message, will lead to Nullpointers in clients.
          -If providing the attribute change notification feature, then atleast it should be explicit (via a different API) instead of doing the automatic registration in getJmxAttribute method. All test clients may not need this feature and anyway they have to poll using the getAttribute API. So I would prefer to remove the notification feature.

          Show
          Sharad Agarwal added a comment - current JMX beans implementation in Hadoop doesn't provide notification broadcasting functionality so attributes I think it is not a good idea to have listener based attribute updates as it looks to me a overkill and may lead to lot of remote traffic. Alternatively, tests can ask/poll for attribute values which anyway they have to do using the get API. Comments on patch: -jmxAttributes cache in AbstractDaemonClient shouldn't be static, no ? -at some places exceptions are caught and not thrown with proper message, will lead to Nullpointers in clients. -If providing the attribute change notification feature, then atleast it should be explicit (via a different API) instead of doing the automatic registration in getJmxAttribute method. All test clients may not need this feature and anyway they have to poll using the getAttribute API. So I would prefer to remove the notification feature.
          Hide
          Konstantin Boudnik added a comment -

          Addressing Sharad's comments. I have moved listener registration to a separate API call which has to be called explicitly if a test needs to have that auto-update (which doesn't work in Hadoop at the moment). Otherwise, repetitive pull needs to be done to refresh an attribute's value.

          Show
          Konstantin Boudnik added a comment - Addressing Sharad's comments. I have moved listener registration to a separate API call which has to be called explicitly if a test needs to have that auto-update (which doesn't work in Hadoop at the moment). Otherwise, repetitive pull needs to be done to refresh an attribute's value.
          Hide
          Konstantin Boudnik added a comment -

          Also, I have ran updated HDFS system tests a number of times and new functionality works as expected. I'm going to post test-patch results shortly.

          Show
          Konstantin Boudnik added a comment - Also, I have ran updated HDFS system tests a number of times and new functionality works as expected. I'm going to post test-patch results shortly.
          Hide
          Konstantin Boudnik added a comment -

          After chatting with Sharad offline I agreed that providing notification listeners in Herriot while Hadoop JMX beans don't support it is like provide a new feature which doesn't work outright. Removing it from the main patch.

          In case we'd need it later I'm attaching the listeners specific patch for a reference.

          Show
          Konstantin Boudnik added a comment - After chatting with Sharad offline I agreed that providing notification listeners in Herriot while Hadoop JMX beans don't support it is like provide a new feature which doesn't work outright. Removing it from the main patch. In case we'd need it later I'm attaching the listeners specific patch for a reference.
          Hide
          Konstantin Boudnik added a comment -

          Patch for HDFS trunk.

          Show
          Konstantin Boudnik added a comment - Patch for HDFS trunk.
          Hide
          Konstantin Boudnik added a comment -

          jmx.patch has been moved over to HADOOP-6977 where it belongs.

          Show
          Konstantin Boudnik added a comment - jmx.patch has been moved over to HADOOP-6977 where it belongs.
          Hide
          Konstantin Boudnik added a comment -

          Changes corresponding to the latest HADOOP-6977 patch.

          Show
          Konstantin Boudnik added a comment - Changes corresponding to the latest HADOOP-6977 patch.
          Hide
          Boris Shkolnik added a comment -

          +1
          small nits:
          1. looks like the only purpose of readAttr in Test is to get capacity - shouldn't it be named like that?
          2. Instead of overwriting isJmxEnabled() and getJmxPortNumber() you could create and overwrite a method that gets and extra environment variable name.. So implementation of isJmxEnabled() is something like:
          isJmxEnabled(HADOO_OPT) || isJmxEnabled(getExtraEnvVarName())...
          this one is a matter of preference though....

          Show
          Boris Shkolnik added a comment - +1 small nits: 1. looks like the only purpose of readAttr in Test is to get capacity - shouldn't it be named like that? 2. Instead of overwriting isJmxEnabled() and getJmxPortNumber() you could create and overwrite a method that gets and extra environment variable name.. So implementation of isJmxEnabled() is something like: isJmxEnabled(HADOO_OPT) || isJmxEnabled(getExtraEnvVarName())... this one is a matter of preference though....
          Hide
          Konstantin Boudnik added a comment -

          New patch addresses the comment and is corresponding to the last update of HADOOP-6977

          Show
          Konstantin Boudnik added a comment - New patch addresses the comment and is corresponding to the last update of HADOOP-6977
          Hide
          Boris Shkolnik added a comment -

          +1

          Show
          Boris Shkolnik added a comment - +1
          Hide
          Konstantin Boudnik added a comment -

          Quick check.

          Show
          Konstantin Boudnik added a comment - Quick check.
          Hide
          Konstantin Boudnik added a comment -

          I have ran test-patch manually - below - and see the same 97 audit warnings as Todd got for his latest patch.

          Apparently, test-patch is having issues again. I'm going to commit this patch.

          {format}
          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 97 release audit warnings (more than the trunk's current 1 warnings).

          +1 system test framework. The patch passed system test framework compile.{format}
          Show
          Konstantin Boudnik added a comment - I have ran test-patch manually - below - and see the same 97 audit warnings as Todd got for his latest patch. Apparently, test-patch is having issues again. I'm going to commit this patch. {format} -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 97 release audit warnings (more than the trunk's current 1 warnings). +1 system test framework. The patch passed system test framework compile.{format}
          Hide
          Konstantin Boudnik added a comment -

          I have just committed it to trunk.

          Show
          Konstantin Boudnik added a comment - I have just committed it to trunk.
          Hide
          Suresh Srinivas added a comment -

          On trunk when I do "ant inject-system-faults", I get the following errors. Is this related to this change?

          [echo] Start weaving aspects in place
          [iajc] /Users/sureshms/Documents/workspace/hadoop.hdfs/src/test/system/java/org/apache/hadoop/hdfs/test/system/DNClient.java:36 [error] The type DNClient must implement the inherited abstract method AbstractDaemonClient<DNProtocol>.getDaemonAttribute(String)
          [iajc] public class DNClient extends HDFSDaemonClient<DNProtocol> {
          [iajc] ^^^^^^^
          [iajc] /Users/sureshms/Documents/workspace/hadoop.hdfs/src/test/system/java/org/apache/hadoop/hdfs/test/system/DNClient.java:36 [error] The type DNClient must implement the inherited abstract method AbstractDaemonClient<DNProtocol>.getHadoopOptsEnvName()
          [iajc] public class DNClient extends HDFSDaemonClient<DNProtocol> {
          [iajc] ^^^^^^^
          [iajc] /Users/sureshms/Documents/workspace/hadoop.hdfs/src/test/system/java/org/apache/hadoop/hdfs/test/system/NNClient.java:31 [error] The type NNClient must implement the inherited abstract method AbstractDaemonClient<NNProtocol>.getDaemonAttribute(String)
          [iajc] public class NNClient extends HDFSDaemonClient<NNProtocol> {
          [iajc] ^^^^^^^
          [iajc] /Users/sureshms/Documents/workspace/hadoop.hdfs/src/test/system/java/org/apache/hadoop/hdfs/test/system/NNClient.java:31 [error] The type NNClient must implement the inherited abstract method AbstractDaemonClient<NNProtocol>.getHadoopOptsEnvName()
          [iajc] public class NNClient extends HDFSDaemonClient<NNProtocol> {
          [iajc] ^^^^^^^
          [iajc]
          [iajc] 4 errors

          Show
          Suresh Srinivas added a comment - On trunk when I do "ant inject-system-faults", I get the following errors. Is this related to this change? [echo] Start weaving aspects in place [iajc] /Users/sureshms/Documents/workspace/hadoop.hdfs/src/test/system/java/org/apache/hadoop/hdfs/test/system/DNClient.java:36 [error] The type DNClient must implement the inherited abstract method AbstractDaemonClient<DNProtocol>.getDaemonAttribute(String) [iajc] public class DNClient extends HDFSDaemonClient<DNProtocol> { [iajc] ^^^^^^^ [iajc] /Users/sureshms/Documents/workspace/hadoop.hdfs/src/test/system/java/org/apache/hadoop/hdfs/test/system/DNClient.java:36 [error] The type DNClient must implement the inherited abstract method AbstractDaemonClient<DNProtocol>.getHadoopOptsEnvName() [iajc] public class DNClient extends HDFSDaemonClient<DNProtocol> { [iajc] ^^^^^^^ [iajc] /Users/sureshms/Documents/workspace/hadoop.hdfs/src/test/system/java/org/apache/hadoop/hdfs/test/system/NNClient.java:31 [error] The type NNClient must implement the inherited abstract method AbstractDaemonClient<NNProtocol>.getDaemonAttribute(String) [iajc] public class NNClient extends HDFSDaemonClient<NNProtocol> { [iajc] ^^^^^^^ [iajc] /Users/sureshms/Documents/workspace/hadoop.hdfs/src/test/system/java/org/apache/hadoop/hdfs/test/system/NNClient.java:31 [error] The type NNClient must implement the inherited abstract method AbstractDaemonClient<NNProtocol>.getHadoopOptsEnvName() [iajc] public class NNClient extends HDFSDaemonClient<NNProtocol> { [iajc] ^^^^^^^ [iajc] [iajc] 4 errors
          Hide
          Suresh Srinivas added a comment -

          Additionally can you comment on this -1 from your test patch results:
          > -1 release audit. The applied patch generated 97 release audit warnings (more than the trunk's current 1 warnings).

          Show
          Suresh Srinivas added a comment - Additionally can you comment on this -1 from your test patch results: > -1 release audit. The applied patch generated 97 release audit warnings (more than the trunk's current 1 warnings).
          Hide
          Konstantin Boudnik added a comment -

          On trunk when I do "ant inject-system-faults", I get the following errors. Is this related to this change?

          Awesome... I have committed only CHANGES.txt modification Thanks for catching it up, I have delivered second commit with the change itself. Sorry about any inconvenience.

          Additionally can you comment on this -1 from your test patch results:

          Actually, I have in here - there seems to be some crazy issue with test-patch: even if executed on an empty patch it shows 97 warnings.

          Show
          Konstantin Boudnik added a comment - On trunk when I do "ant inject-system-faults", I get the following errors. Is this related to this change? Awesome... I have committed only CHANGES.txt modification Thanks for catching it up, I have delivered second commit with the change itself. Sorry about any inconvenience. Additionally can you comment on this -1 from your test patch results: Actually, I have in here - there seems to be some crazy issue with test-patch: even if executed on an empty patch it shows 97 warnings.

            People

            • Assignee:
              Konstantin Boudnik
              Reporter:
              Al Thompson
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development