Kafka
  1. Kafka
  2. KAFKA-541

Use metrics CSV reporter instead of jmx tool for system tests

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The existing system test framework spawns off a bunch of jmxtool processes to collect metrics. This is rather heavy-weight and also, requires advance knowledge of all the beans (many of which are dynamically registered). E.g., per-topic stats pop-up only after the topics are produced to.

      Since we are using metrics-core, we can just turn on the CSV reporter to collect these stats. I had originally thought version 2.1.3 had various bugs that rendered it unusable for CSV reporter, but I gave it another try and it seems to be fine. Will post some output.

      1. kafka_541_v1.diff
        13 kB
        Yang Ye
      2. kafka_541_v2.diff
        23 kB
        Yang Ye
      3. metrics-core-3.0.1.jar
        80 kB
        Yang Ye
      4. metrics-annotation-3.0.1.jar
        5 kB
        Yang Ye
      5. kafka_541_v3.diff
        36 kB
        Yang Ye

        Issue Links

          Activity

          Hide
          Joel Koshy added a comment -

          Some sample output:

          https://gist.github.com/3830723

          Show
          Joel Koshy added a comment - Some sample output: https://gist.github.com/3830723
          Hide
          Joel Koshy added a comment -

          Actually, now I remember. metrics 3.0.0 fixes this issue: https://github.com/codahale/metrics/pull/225

          It affects us because we have different scopes for similar metrics. e.g., NumDelayedRequests in both FetchRequestPurgatory and ProducerRequestPurgatory, so they collide.

          E.g., with metrics 2.1.3 we would just have NumDelayedRequests.csv but with 3.0.0 we would have kafka.server.ProducerRequestPurgatory.NumDelayedRequests.csv and kafka.server.FetchRequestPurgatory.NumDelayedRequests.csv

          I'll file a separate jira to upgrade to 3.* as this one is for getting our system test framework to consume the generated csv files.

          Show
          Joel Koshy added a comment - Actually, now I remember. metrics 3.0.0 fixes this issue: https://github.com/codahale/metrics/pull/225 It affects us because we have different scopes for similar metrics. e.g., NumDelayedRequests in both FetchRequestPurgatory and ProducerRequestPurgatory, so they collide. E.g., with metrics 2.1.3 we would just have NumDelayedRequests.csv but with 3.0.0 we would have kafka.server.ProducerRequestPurgatory.NumDelayedRequests.csv and kafka.server.FetchRequestPurgatory.NumDelayedRequests.csv I'll file a separate jira to upgrade to 3.* as this one is for getting our system test framework to consume the generated csv files.
          Hide
          Joel Koshy added a comment -

          Also, FYI, here are the configs to add to a server to get csv reporting enabled on startup. It can also be started up through the JMX operation if not enabled at start-up.

          kafka.metrics.polling.interval.secs=5
          kafka.metrics.reporters=kafka.metrics.KafkaCSVMetricsReporter
          kafka.csv.metrics.dir=kafka_metrics
          kafka.csv.metrics.reporter.enabled=true

          Show
          Joel Koshy added a comment - Also, FYI, here are the configs to add to a server to get csv reporting enabled on startup. It can also be started up through the JMX operation if not enabled at start-up. kafka.metrics.polling.interval.secs=5 kafka.metrics.reporters=kafka.metrics.KafkaCSVMetricsReporter kafka.csv.metrics.dir=kafka_metrics kafka.csv.metrics.reporter.enabled=true
          Hide
          Neha Narkhede added a comment -

          +1, this will be great to have. Also, even with the upgrade, we still need a way to filter which attributes we want to plot per mbean for system tests and organize the graphs in dashboards per role. I think we can use the existing metrics.json to specify this, unless you are suggesting to change that ?

          Show
          Neha Narkhede added a comment - +1, this will be great to have. Also, even with the upgrade, we still need a way to filter which attributes we want to plot per mbean for system tests and organize the graphs in dashboards per role. I think we can use the existing metrics.json to specify this, unless you are suggesting to change that ?
          Hide
          Joel Koshy added a comment -

          That's right - we would just use metrics.json, although instead of a bean it would specify the name of the CSV file, and some of the column names may be different.

          Show
          Joel Koshy added a comment - That's right - we would just use metrics.json, although instead of a bean it would specify the name of the CSV file, and some of the column names may be different.
          Hide
          Yang Ye added a comment -

          1. Currently, only broker supports csv reporter, we added its support to producer and consumer

          2. In system test, disable the

          3. the output of the csvReporter is not quite the same as JMX metrics collection's, some lines of output file may be empty line (we tried to avoid them), the header of csv files are different, we added some mapping in kafka_system_test_utils.py to discern them.

          4. changed the "generate_overriden_props_files" to add csv supporting properties to the broker config

          NOT DONE YET:
          the producer_performance and console_consumer don't support CSV reporter yet.. it's hard to do because they don't support properties file as input, instead they take the configuration directly from the command line. Will think about how to make they support

          Show
          Yang Ye added a comment - 1. Currently, only broker supports csv reporter, we added its support to producer and consumer 2. In system test, disable the 3. the output of the csvReporter is not quite the same as JMX metrics collection's, some lines of output file may be empty line (we tried to avoid them), the header of csv files are different, we added some mapping in kafka_system_test_utils.py to discern them. 4. changed the "generate_overriden_props_files" to add csv supporting properties to the broker config NOT DONE YET: the producer_performance and console_consumer don't support CSV reporter yet.. it's hard to do because they don't support properties file as input, instead they take the configuration directly from the command line. Will think about how to make they support
          Hide
          Jun Rao added a comment -

          Thanks for the patch. Some comments:

          1. I see the following error while running run_sanity.sh in system test.
          2012-10-19 07:52:05,210 - ERROR - ERROR while plotting graph /home/jrao/Intellij_workspace/kafka_0.8_203/system_test/replication_testsuite/testcase_1/dashboards/broker/kafka.server:type=BrokerTopicMetrics,name=AllTopicsBytesInPerSec:OneMinuteRate.svg: float division (metrics)

          2. kafka_system_test_utils.py: The following line doesn't look right and there are multiple instances of this.
          + addedCSVConfig["kfka.metrics.polling.interval.secsafka.csv.metrics.reporter.enabled"] = "true"

          3. We probably shouldn't register the metrics reporter in Producer and ZookeeperConsumerConnector since a single jvm can create multiple instances of each. Instead, we could make a util that registers the metrics reporter and invoke it in standalone tools like ProducerPerformance and ConsoleConsumer.

          Show
          Jun Rao added a comment - Thanks for the patch. Some comments: 1. I see the following error while running run_sanity.sh in system test. 2012-10-19 07:52:05,210 - ERROR - ERROR while plotting graph /home/jrao/Intellij_workspace/kafka_0.8_203/system_test/replication_testsuite/testcase_1/dashboards/broker/kafka.server:type=BrokerTopicMetrics,name=AllTopicsBytesInPerSec:OneMinuteRate.svg: float division (metrics) 2. kafka_system_test_utils.py: The following line doesn't look right and there are multiple instances of this. + addedCSVConfig ["kfka.metrics.polling.interval.secsafka.csv.metrics.reporter.enabled"] = "true" 3. We probably shouldn't register the metrics reporter in Producer and ZookeeperConsumerConnector since a single jvm can create multiple instances of each. Instead, we could make a util that registers the metrics reporter and invoke it in standalone tools like ProducerPerformance and ConsoleConsumer.
          Hide
          Yang Ye added a comment -

          Changes since first patch:

          1. make the kafka csv metrics reporter a singleton — only one instance in one JVM

          2. adding csv metrics reporter support for producer / consumer /producer-performance / console-consumer

          3. fix a few places in the system test to make it actually working — with producer_performance and console-consumer, and plotting graph correctly

          4. there's only one problem left : some of the metrics are of type "time", according to the doc, timer reports data as both a Meter and Histagram. (this is also verified in the JMX console)

          But for the exported CSV file, we only see the header like:

          1. time,min,max,mean,median,stddev,95%,99%,99.9%

          That's to say, we cannot find the Meter / rate related metrics.

          For a fiew graphs requiring "One minute rate" from a timer metric (and only for them), the data are not reported and graphs are not plotted

          Show
          Yang Ye added a comment - Changes since first patch: 1. make the kafka csv metrics reporter a singleton — only one instance in one JVM 2. adding csv metrics reporter support for producer / consumer /producer-performance / console-consumer 3. fix a few places in the system test to make it actually working — with producer_performance and console-consumer, and plotting graph correctly 4. there's only one problem left : some of the metrics are of type "time", according to the doc, timer reports data as both a Meter and Histagram. (this is also verified in the JMX console) But for the exported CSV file, we only see the header like: time,min,max,mean,median,stddev,95%,99%,99.9% That's to say, we cannot find the Meter / rate related metrics. For a fiew graphs requiring "One minute rate" from a timer metric (and only for them), the data are not reported and graphs are not plotted
          Hide
          Yang Ye added a comment -

          changes since v2:

          1. patch the coda hale metrics project and build new jars, which fixes the csvReporter not reporting meter attributes for timer metrics problem

          2. fix the metric.py by not passing non-existent files into inputCsvFiles of plot_graphs() function

          3. change the metrics in kafka RequestMetrics class from in unit of nano seconds to milli seconds

          4. build a getter for "props" attribute in ConsumerConfig,

          5. making the KafkaCSVMetricsReporter a singleton, only one instance can be started within each JVM

          Show
          Yang Ye added a comment - changes since v2: 1. patch the coda hale metrics project and build new jars, which fixes the csvReporter not reporting meter attributes for timer metrics problem 2. fix the metric.py by not passing non-existent files into inputCsvFiles of plot_graphs() function 3. change the metrics in kafka RequestMetrics class from in unit of nano seconds to milli seconds 4. build a getter for "props" attribute in ConsumerConfig, 5. making the KafkaCSVMetricsReporter a singleton, only one instance can be started within each JVM
          Hide
          Neha Narkhede added a comment -

          Committed v4 after making the following changes to v3 -

          1. Fixed a logging statement in metrics.py since the newly added statement threw a runtime error
          2. Fixed KafkaProject.scala to point to the right metrics package version (3.0.1)
          3. Fixed the ConsumerConfig constructor argument to be a val and removed the getter API added in v3

          Show
          Neha Narkhede added a comment - Committed v4 after making the following changes to v3 - 1. Fixed a logging statement in metrics.py since the newly added statement threw a runtime error 2. Fixed KafkaProject.scala to point to the right metrics package version (3.0.1) 3. Fixed the ConsumerConfig constructor argument to be a val and removed the getter API added in v3
          Hide
          Joel Koshy added a comment -

          Delayed review, but nevertheless:

          The KafkaCSVMetricsReporter object may be better named as KafkaMetricsReporter since startCSVMetricsReporter
          potentially starts up other (non-CSV) reporters (if any) as well - in which case KafkaMetricsReporter.scala would be a
          better place for it. Or, you can just filter out non-CSV reporters.

          Also, the top-level/config/server.properties need not enable the csv reporter. I thought the system test replication
          suite's server.properties would need to be patched, but it isn't. Does this mean the test suite picks up the top-level
          config as a template?

          I can take care of the above in some other (unrelated) jiras that I have on my plate. However:

          Can you please attach the metrics patch to fix the Timer CSV reporting? and ideally, submit a pull-request to the
          metrics project? We should get off our patched version of metrics as soon as metrics-core@github is patched. (I had put
          the metrics github hash in the previous jar so we know exactly which version of the code we are on.)

          Show
          Joel Koshy added a comment - Delayed review, but nevertheless: The KafkaCSVMetricsReporter object may be better named as KafkaMetricsReporter since startCSVMetricsReporter potentially starts up other (non-CSV) reporters (if any) as well - in which case KafkaMetricsReporter.scala would be a better place for it. Or, you can just filter out non-CSV reporters. Also, the top-level/config/server.properties need not enable the csv reporter. I thought the system test replication suite's server.properties would need to be patched, but it isn't. Does this mean the test suite picks up the top-level config as a template? I can take care of the above in some other (unrelated) jiras that I have on my plate. However: Can you please attach the metrics patch to fix the Timer CSV reporting? and ideally, submit a pull-request to the metrics project? We should get off our patched version of metrics as soon as metrics-core@github is patched. (I had put the metrics github hash in the previous jar so we know exactly which version of the code we are on.)
          Hide
          Joel Koshy added a comment -

          Looks like your pull request is in. I'll file another jira to do the above.

          Show
          Joel Koshy added a comment - Looks like your pull request is in. I'll file another jira to do the above.

            People

            • Assignee:
              Yang Ye
              Reporter:
              Joel Koshy
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development