Chukwa
  1. Chukwa
  2. CHUKWA-12

Add instrumentation Api for Chukwa components

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Chukwa Components should be able to emit some metrics in an easy way.
      I'm thinking of reusing the new HADOOP JMX instrumentation API to do that + MetricsContext to output/collect them using chukwa.

      1. CHUKWA-12.patch
        48 kB
        Jerome Boulon
      2. chukwa12 errors.txt
        11 kB
        Ari Rabkin
      3. CHUKWA-12-2.patch
        49 kB
        Jerome Boulon
      4. CHUKWA-12-3.patch
        52 kB
        Jerome Boulon

        Issue Links

          Activity

          Hide
          Jerome Boulon added a comment -

          Show
          Jerome Boulon added a comment -
          Hide
          Ari Rabkin added a comment -

          I just committed this. Thanks Jerome, by the way, for keeping at this, even while the tree was changing out from under you.

          Show
          Ari Rabkin added a comment - I just committed this. Thanks Jerome, by the way, for keeping at this, even while the tree was changing out from under you.
          Hide
          Jerome Boulon added a comment -

          Previous patch didn't apply because of CHUKWA-65 (hadoop metrics log files should be managed more cleanly).

          On top my prevous comments my patch:

          • Remove all changes made to Log4JMetricsContext.java from CHUKWA-65
          • Delete conf/chukwa-hadoop-metrics-log4j.properties
            ---> remove dependency on an external log4j file (conf/chukwa-hadoop-metrics-log4j.properties) everything is done via hadoop-metrics.properties since there's a well defined way to pass parameters to the MetricContext class
          • Use JAVA API to set Read/Write permission (RW for ALL since another user need to delete the file see: CHUKWA-65 )
            --> this part is still not "clean" in my mind since anyone can RW ... but I don't have a better idea for now
          • Move conf/hadoop-metrics.properties to conf/hadoop-metrics.properties.template
          • Fix bin/VERSION
          Show
          Jerome Boulon added a comment - Previous patch didn't apply because of CHUKWA-65 (hadoop metrics log files should be managed more cleanly). On top my prevous comments my patch: Remove all changes made to Log4JMetricsContext.java from CHUKWA-65 Delete conf/chukwa-hadoop-metrics-log4j.properties ---> remove dependency on an external log4j file (conf/chukwa-hadoop-metrics-log4j.properties) everything is done via hadoop-metrics.properties since there's a well defined way to pass parameters to the MetricContext class Use JAVA API to set Read/Write permission (RW for ALL since another user need to delete the file see: CHUKWA-65 ) --> this part is still not "clean" in my mind since anyone can RW ... but I don't have a better idea for now Move conf/hadoop-metrics.properties to conf/hadoop-metrics.properties.template Fix bin/VERSION
          Hide
          Ari Rabkin added a comment -

          This patch doesn't apply to trunk anymore.

          Show
          Ari Rabkin added a comment - This patch doesn't apply to trunk anymore.
          Hide
          Jerome Boulon added a comment -

          Same patch with the correct Hadoop 20 jar + agent.sh now is using the H.20 jar

          Show
          Jerome Boulon added a comment - Same patch with the correct Hadoop 20 jar + agent.sh now is using the H.20 jar
          Hide
          Jerome Boulon added a comment -

          Sorry my fault ... I had HADOOP_HOME defined and pointing to Hadoop 20.

          Show
          Jerome Boulon added a comment - Sorry my fault ... I had HADOOP_HOME defined and pointing to Hadoop 20.
          Hide
          Eric Yang added a comment -

          I got the exactly same error.

          Show
          Eric Yang added a comment - I got the exactly same error.
          Hide
          Ari Rabkin added a comment - - edited

          Here's the errors I see. It's possible that something is quirky about my setup here, but it was using a clean copy of trunk so I'm nervous.

          It looks like the problem is that the test is using the hadoop-18 jar, which doesn't have the MetricsIntValue constructor you use. Any reason why my tests are using the wrong jar?

          Show
          Ari Rabkin added a comment - - edited Here's the errors I see. It's possible that something is quirky about my setup here, but it was using a clean copy of trunk so I'm nervous. It looks like the problem is that the test is using the hadoop-18 jar, which doesn't have the MetricsIntValue constructor you use. Any reason why my tests are using the wrong jar?
          Hide
          Ari Rabkin added a comment - - edited

          This patch fails unit tests for me.

          Show
          Ari Rabkin added a comment - - edited This patch fails unit tests for me.
          Hide
          Jerome Boulon added a comment -

          Thanks

          Show
          Jerome Boulon added a comment - Thanks
          Hide
          Ari Rabkin added a comment -

          +1.

          By the way, I appreciate the very complete explanation of what the patch does.

          Show
          Ari Rabkin added a comment - +1. By the way, I appreciate the very complete explanation of what the patch does.
          Hide
          Jerome Boulon added a comment -

          The new metrics instrumentation is using the current Hadoop metrics implementation, that way we avoid code duplication and maintenance in 2 places
          Current patch provide metrics information for:

          • Agent
          • ChunkQueue
          • HTTPSender

          Metrics are available using the standard Hadoop Metrics context and JMX.

          However, the Hadoop AbstractMetricsContext has a "bug/feature" depending on who you are talking to.
          The AbstractMetricsContext does not reset value and therefore only output accumulated values instead of rate.
          I've copy the Hadoop class in Chukwa tree to fix this problem (output rate and accumulated value to be compatible).
          The idea is to test this functionality in Chukwa and then submit this change to Hadoop.

          Also current patch

          • fix the chukwa-agent.jar creation
            • include class files, not just .java files
          • fix Log4JMetricsContext
            The previous Log4JMetricsContext contains a bug CHUKWA-49 that has been fixed here but the previous Log4JMetricsContext
            was hard to configure.
            In order to output dfs metrics for example we had to configure the standard hadoop-metrics.properties AND conf/chukwa-hadoop-metrics-log4j.properties.
            The current implementation is using only the hadoop-metrics.properties file and dynamically register all appenders/loggers.

          There's an incompatible change:
          the RecordType was previously set in chukwa-hadoop-metrics-log4j.properties, now the recordType is set to the contextName.
          This should not be a problem since we already have aliases on demux Parsers.
          Also, now we have to provide the metrics output directory using hadoop-metrics.properties

          The "uuid" parameter is to append the ms time to the log file's name in order to make it unique. this is required for hadoop jvm/rpc
          metrics since more than one process is running on the same machine.

          I'm providing a updated version of chukwa-demux-conf.xml.template and hadoop-metrics.properties.

          Show
          Jerome Boulon added a comment - The new metrics instrumentation is using the current Hadoop metrics implementation, that way we avoid code duplication and maintenance in 2 places Current patch provide metrics information for: Agent ChunkQueue HTTPSender Metrics are available using the standard Hadoop Metrics context and JMX. However, the Hadoop AbstractMetricsContext has a "bug/feature" depending on who you are talking to. The AbstractMetricsContext does not reset value and therefore only output accumulated values instead of rate. I've copy the Hadoop class in Chukwa tree to fix this problem (output rate and accumulated value to be compatible). The idea is to test this functionality in Chukwa and then submit this change to Hadoop. Also current patch fix the chukwa-agent.jar creation include class files, not just .java files fix Log4JMetricsContext The previous Log4JMetricsContext contains a bug CHUKWA-49 that has been fixed here but the previous Log4JMetricsContext was hard to configure. In order to output dfs metrics for example we had to configure the standard hadoop-metrics.properties AND conf/chukwa-hadoop-metrics-log4j.properties. The current implementation is using only the hadoop-metrics.properties file and dynamically register all appenders/loggers. There's an incompatible change: the RecordType was previously set in chukwa-hadoop-metrics-log4j.properties, now the recordType is set to the contextName. This should not be a problem since we already have aliases on demux Parsers. Also, now we have to provide the metrics output directory using hadoop-metrics.properties The "uuid" parameter is to append the ms time to the log file's name in order to make it unique. this is required for hadoop jvm/rpc metrics since more than one process is running on the same machine. I'm providing a updated version of chukwa-demux-conf.xml.template and hadoop-metrics.properties.
          Hide
          Jerome Boulon added a comment -

          fix chukwa-agent.jar

          Show
          Jerome Boulon added a comment - fix chukwa-agent.jar
          Hide
          Jerome Boulon added a comment -
          • Add metrics instrumentation to Queue, Agent and HTTPSender
          • fix Log4JMetricsContext
            --> fix CHUKWA-49
          Show
          Jerome Boulon added a comment - Add metrics instrumentation to Queue, Agent and HTTPSender fix Log4JMetricsContext --> fix CHUKWA-49
          Hide
          Jerome Boulon added a comment -

          Raise priority for 0.1.2 release.

          Show
          Jerome Boulon added a comment - Raise priority for 0.1.2 release.
          Hide
          Jerome Boulon added a comment -

          yes, that's the goal
          Also when this part will be done,I will probably rewrite the log4jMetricsContext and the Demux parser to be generic.

          Show
          Jerome Boulon added a comment - yes, that's the goal Also when this part will be done,I will probably rewrite the log4jMetricsContext and the Demux parser to be generic.
          Hide
          Ari Rabkin added a comment -

          This sounds like a good thing. There's currently a lot of sort of yucky code to monitor and print these things. You might remove that code in this patch.

          Show
          Ari Rabkin added a comment - This sounds like a good thing. There's currently a lot of sort of yucky code to monitor and print these things. You might remove that code in this patch.
          Hide
          Jerome Boulon added a comment -

          here a list of basic stats for chukwa that I'm planning to expose while I'm adding the instrumentation API:

          Agent Metrics

          • uptime
          • adaptor count

          HTTP Sender Metrics

          • Http post count/minute, exception count,

          Collector Metrics

          • #connection/minute, data size written/minute

          later on, anybody will be able to add new metrics using the instrumentation API (ex Adaptor specific stats, Queue stats,....)

          Show
          Jerome Boulon added a comment - here a list of basic stats for chukwa that I'm planning to expose while I'm adding the instrumentation API: Agent Metrics uptime adaptor count HTTP Sender Metrics Http post count/minute, exception count, Collector Metrics #connection/minute, data size written/minute later on, anybody will be able to add new metrics using the instrumentation API (ex Adaptor specific stats, Queue stats,....)

            People

            • Assignee:
              Jerome Boulon
              Reporter:
              Jerome Boulon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development