Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:

      Description

      Currently, there is no way to measure and observe Tajo's internal operations. As a Hadoop job, Tajo also should measure and record a lot of operational works.

      I would like to propose the use of Metrics (http://metrics.codahale.com/). It is well known as a stable and efficient metric library.

      1. tajo-ganglia-02.png
        622 kB
        Hyoungjun Kim
      2. tajo-ganglia-01.png
        538 kB
        Hyoungjun Kim
      3. TAJO-333_2.patch
        100 kB
        Hyoungjun Kim
      4. TAJO-333.patch
        82 kB
        Hyoungjun Kim

        Activity

        Hide
        jihoonson Jihoon Son added a comment -

        We need a document for the metric system.
        Hyoungjun, would you write the document, please?

        Show
        jihoonson Jihoon Son added a comment - We need a document for the metric system. Hyoungjun, would you write the document, please?
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-trunk-postcommit #617 (See https://builds.apache.org/job/Tajo-trunk-postcommit/617/)
        TAJO-333: Add metric system to Tajo. (hyoungjunkim via jihoon) (jihoonson: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=62c49c05f522158d75d818df49100a0b1fd354bb)

        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/GlobalEngine.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/NullReporter.java
        • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/util/metrics/TestSystemMetrics.java
        • tajo-core/tajo-core-backend/src/main/java/log4j.properties
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/MetricsFileScheduledReporter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/MetricsStreamScheduledReporter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/TajoSystemMetrics.java
        • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/util/metrics/TestMetricsFilter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/metrics/WorkerResourceMetricsGaugeSet.java
        • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/TajoMetrics.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/TajoMaster.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/TajoMetricsReporter.java
        • tajo-core/tajo-core-backend/src/main/resources/tajo-metrics.properties
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/TajoMetricsScheduledReporter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TajoWorkerManagerService.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/RegexpMetricsFilter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/metrics/CatalogMetricsGaugeSet.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/MetricsFilterList.java
        • tajo-core/tajo-core-backend/pom.xml
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/LogEventGaugeSet.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TaskRunner.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/TajoLogEventCounter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/MetricsConsoleScheduledReporter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TajoWorker.java
        • CHANGES.txt
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TaskRunnerManager.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/MetricsConsoleReporter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/GroupNameMetricsFilter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/GangliaReporter.java
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterTask.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-trunk-postcommit #617 (See https://builds.apache.org/job/Tajo-trunk-postcommit/617/ ) TAJO-333 : Add metric system to Tajo. (hyoungjunkim via jihoon) (jihoonson: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=62c49c05f522158d75d818df49100a0b1fd354bb ) tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/GlobalEngine.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/NullReporter.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/util/metrics/TestSystemMetrics.java tajo-core/tajo-core-backend/src/main/java/log4j.properties tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/MetricsFileScheduledReporter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/MetricsStreamScheduledReporter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/TajoSystemMetrics.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/util/metrics/TestMetricsFilter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/metrics/WorkerResourceMetricsGaugeSet.java tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/TajoMetrics.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/TajoMaster.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/TajoMetricsReporter.java tajo-core/tajo-core-backend/src/main/resources/tajo-metrics.properties tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/TajoMetricsScheduledReporter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TajoWorkerManagerService.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/RegexpMetricsFilter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/metrics/CatalogMetricsGaugeSet.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/MetricsFilterList.java tajo-core/tajo-core-backend/pom.xml tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/LogEventGaugeSet.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TaskRunner.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/TajoLogEventCounter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/MetricsConsoleScheduledReporter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TajoWorker.java CHANGES.txt tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TaskRunnerManager.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/MetricsConsoleReporter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/GroupNameMetricsFilter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/metrics/reporter/GangliaReporter.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterTask.java
        Hide
        jihoonson Jihoon Son added a comment -

        I've just committed the patch.
        Hyoungjun, thanks for your great contribution!

        Show
        jihoonson Jihoon Son added a comment - I've just committed the patch. Hyoungjun, thanks for your great contribution!
        Hide
        hyunsik Hyunsik Choi added a comment -

        +1
        It's a really great job. Ship it!

        Show
        hyunsik Hyunsik Choi added a comment - +1 It's a really great job. Ship it!
        Hide
        hyunsik Hyunsik Choi added a comment -

        I'm also reviewing the latest patch.

        Show
        hyunsik Hyunsik Choi added a comment - I'm also reviewing the latest patch.
        Hide
        hsaputra Henry Saputra added a comment -

        Looks very good, tried out the patch and pass compilation and base case of Tajo working.

        +1

        Show
        hsaputra Henry Saputra added a comment - Looks very good, tried out the patch and pass compilation and base case of Tajo working. +1
        Hide
        jihoonson Jihoon Son added a comment -

        I'll wait for others' reviews until tomorrow.

        Show
        jihoonson Jihoon Son added a comment - I'll wait for others' reviews until tomorrow.
        Hide
        jihoonson Jihoon Son added a comment -

        Hyoungjun helped me in face to face.
        There was a mistake in tajo-metrics.propertis.
        I tested the patch, and it works well.

        Here is my +1, and I'll commit this patch if there are any objections.
        Thanks, Hyoungjun.

        Show
        jihoonson Jihoon Son added a comment - Hyoungjun helped me in face to face. There was a mistake in tajo-metrics.propertis. I tested the patch, and it works well. Here is my +1, and I'll commit this patch if there are any objections. Thanks, Hyoungjun.
        Hide
        jihoonson Jihoon Son added a comment - - edited

        The Ganglia reporter looks really great!
        But, I have a question for the console reporter.
        I configured logj4.properties as in this patch.

        log4j.rootLogger=info,stdout,EventCounter
        log4j.threshhold=ALL
        log4j.appender.stdout=org.apache.log4j.ConsoleAppender
        log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
        log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
        log4j.logger.org.apache.hadoop=WARN
        log4j.logger.org.apache.hadoop.conf=ERROR
        log4j.appender.EventCounter=org.apache.tajo.util.metrics.TajoLogEventCounter
        

        I run a query of "select count( * ) from lineitem" for a 1GB table on a single machine cluster.
        The console reports seems to be written to the query master log, but I can't find it anywhere.

        Would you tell me how I can see the console report?

        Show
        jihoonson Jihoon Son added a comment - - edited The Ganglia reporter looks really great! But, I have a question for the console reporter. I configured logj4.properties as in this patch. log4j.rootLogger=info,stdout,EventCounter log4j.threshhold=ALL log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n log4j.logger.org.apache.hadoop=WARN log4j.logger.org.apache.hadoop.conf=ERROR log4j.appender.EventCounter=org.apache.tajo.util.metrics.TajoLogEventCounter I run a query of "select count( * ) from lineitem" for a 1GB table on a single machine cluster. The console reports seems to be written to the query master log, but I can't find it anywhere. Would you tell me how I can see the console report?
        Hide
        jihoonson Jihoon Son added a comment -

        Thanks for your comment.
        The attached screenshot looks really great!
        I'll test it!

        Show
        jihoonson Jihoon Son added a comment - Thanks for your comment. The attached screenshot looks really great! I'll test it!
        Hide
        hjkim Hyoungjun Kim added a comment -

        When a external monitoring system gathers tajo's metrics data from all servers, each metrics item is required a field which has been generated by the server. For this reason, SystemMetrics prints out daemon's address and private network address is fine.

        Show
        hjkim Hyoungjun Kim added a comment - When a external monitoring system gathers tajo's metrics data from all servers, each metrics item is required a field which has been generated by the server. For this reason, SystemMetrics prints out daemon's address and private network address is fine.
        Hide
        jihoonson Jihoon Son added a comment -

        Thanks for your detailed comment!
        But, I have a question for the MetricsConsoleReporter issue.
        My above comment means that the master address can be the private network address such as 192.168.0.xxx.
        In that case, it seems to print the wrong host address.

        Show
        jihoonson Jihoon Son added a comment - Thanks for your detailed comment! But, I have a question for the MetricsConsoleReporter issue. My above comment means that the master address can be the private network address such as 192.168.0.xxx. In that case, it seems to print the wrong host address.
        Hide
        hjkim Hyoungjun Kim added a comment -

        Jihoon, thanks for your review.
        In this patch

        • change "runEB" to "executedExecutionBlocksNum"
        • add METRICS_PROPERTY_FILENAME variable in TajoConf.ConfVars
        • propertyChangeChecker issue In TajoSystemMetrics
          ReloadingStrategy of PropertiesConfiguration class has not thread for checking property file. So configurationChanged method fired when getXXX method called.
        • "localhost" In MetricsConsoleReporter issue
          TajoMetricsReporter does not print host and port when hostAndPort valus is null.
        • I removed unnecessary debug code and fixed spell in tajo-metrics.properties and removed unused imports.
        • I think that the main purpose of this issue is making metrics system and applying to daemon's metrics. So The adding task for query's metrics like the size of intermediate data which you mentioned is next step.
        • Added ganglia reporter
        Show
        hjkim Hyoungjun Kim added a comment - Jihoon, thanks for your review. In this patch change "runEB" to "executedExecutionBlocksNum" add METRICS_PROPERTY_FILENAME variable in TajoConf.ConfVars propertyChangeChecker issue In TajoSystemMetrics ReloadingStrategy of PropertiesConfiguration class has not thread for checking property file. So configurationChanged method fired when getXXX method called. "localhost" In MetricsConsoleReporter issue TajoMetricsReporter does not print host and port when hostAndPort valus is null. I removed unnecessary debug code and fixed spell in tajo-metrics.properties and removed unused imports. I think that the main purpose of this issue is making metrics system and applying to daemon's metrics. So The adding task for query's metrics like the size of intermediate data which you mentioned is next step. Added ganglia reporter
        Hide
        jihoonson Jihoon Son added a comment -

        In overall, the patch looks good to me.
        But, I have a couple of things to discuss.

        • For the high readability, it will be better to change the names of metrics to be more meaningful. For example, "executedExecutionBlocksNum" looks more meaningful than "runEB". Although the name lengths will increase, but it can increase readability.
        • The default name of metrics file can be better to move from TajoSystemMetrics to TajoConf.ConfVars. Also, it will be better to use TajoConf.getVar() instead of TajoConf.get() in TajoSystemMetrics.constructor().
        • In TajoSystemMetrics, propertyChangeChecker seems to have the same role with FileChangedReloadingStrategy. If this is my misunderstading, please explain to me.
        • Please remove the unused main function from TajoSystemMetrics.
        • In MetricsConsoleReporter, the hostAndPort is fixed as "localhost" and it looks to represent the master address. But, I think that it may be different in some environments. So, how about read the address from QueryMaster?
        • Please remove a debug code at Line 269 in TajoWorker.
        • In tajo-metrics.propertis, there is a typing error at Line 46. (worker-jvm-reporters=cnull => worker-jvm-reporters=null)
        • Please remove unused imports.

        And, how about add a metric for the size of intermediate data?
        I think that it will be useful.

        Show
        jihoonson Jihoon Son added a comment - In overall, the patch looks good to me. But, I have a couple of things to discuss. For the high readability, it will be better to change the names of metrics to be more meaningful. For example, "executedExecutionBlocksNum" looks more meaningful than "runEB". Although the name lengths will increase, but it can increase readability. The default name of metrics file can be better to move from TajoSystemMetrics to TajoConf.ConfVars. Also, it will be better to use TajoConf.getVar() instead of TajoConf.get() in TajoSystemMetrics.constructor(). In TajoSystemMetrics, propertyChangeChecker seems to have the same role with FileChangedReloadingStrategy. If this is my misunderstading, please explain to me. Please remove the unused main function from TajoSystemMetrics. In MetricsConsoleReporter, the hostAndPort is fixed as "localhost" and it looks to represent the master address. But, I think that it may be different in some environments. So, how about read the address from QueryMaster? Please remove a debug code at Line 269 in TajoWorker. In tajo-metrics.propertis, there is a typing error at Line 46. (worker-jvm-reporters=cnull => worker-jvm-reporters=null) Please remove unused imports. And, how about add a metric for the size of intermediate data? I think that it will be useful.
        Hide
        jihoonson Jihoon Son added a comment -

        Great work!
        I'll review the patch.

        Show
        jihoonson Jihoon Son added a comment - Great work! I'll review the patch.
        Hide
        hjkim Hyoungjun Kim added a comment -

        This patch is first patch for tajo’s metric system.

        Tajo metric system has two metrics type.

        system metric

        A system metric which likes daemon’s JVM status, number of worker is emits metrics data periodically.
        A system metric set properties(reporter class, period) with conf/tajo-metrics.properties file.
        In the current patch, A system metrics collects minimal information about daemon. If you need more information, please comment.

        basic metric

        The basic metric can be used at any time.
        For example, If there is more query runtime statistics data, A query optimizer generates better query plan.
        Every query can has a metrics object and can emit metrics data periodically or query closing time.

        TajoMetrics tajoMetrics = new TajoMetrics();
        tajoMetrics.counter(queryId, “numScanBytes”).inc(scanBytes);
        …
        tajoMetrics.report(new TajoMetricsReporter() {
        public void report(SortedMap<String, Gauge> gauges,
                                      SortedMap<String, Counter> counters,
                                      SortedMap<String, Histogram> histograms,
                                      SortedMap<String, Meter> meters,
                                      SortedMap<String, Timer> timers) {
            …
          }
        });
        
        Show
        hjkim Hyoungjun Kim added a comment - This patch is first patch for tajo’s metric system. Tajo metric system has two metrics type. system metric A system metric which likes daemon’s JVM status, number of worker is emits metrics data periodically. A system metric set properties(reporter class, period) with conf/tajo-metrics.properties file. In the current patch, A system metrics collects minimal information about daemon. If you need more information, please comment. basic metric The basic metric can be used at any time. For example, If there is more query runtime statistics data, A query optimizer generates better query plan. Every query can has a metrics object and can emit metrics data periodically or query closing time. TajoMetrics tajoMetrics = new TajoMetrics(); tajoMetrics.counter(queryId, “numScanBytes”).inc(scanBytes); … tajoMetrics.report( new TajoMetricsReporter() { public void report(SortedMap< String , Gauge> gauges, SortedMap< String , Counter> counters, SortedMap< String , Histogram> histograms, SortedMap< String , Meter> meters, SortedMap< String , Timer> timers) { … } });
        Hide
        hsaputra Henry Saputra added a comment -
        Show
        hsaputra Henry Saputra added a comment - +1 for Metrics ( http://metrics.codahale.com/ )
        Hide
        jihoonson Jihoon Son added a comment -

        +1 for this issue.
        The metric information will be very useful for various purpose such as the query optimization.

        Show
        jihoonson Jihoon Son added a comment - +1 for this issue. The metric information will be very useful for various purpose such as the query optimization.
        Hide
        sirpkt Keuntae Park added a comment -

        I totally agree with you about the need for metric system,
        and Metrics looks very good for me, too.

        Is there any similar c++ library for the future c++ worker ?

        Show
        sirpkt Keuntae Park added a comment - I totally agree with you about the need for metric system, and Metrics looks very good for me, too. Is there any similar c++ library for the future c++ worker ?

          People

          • Assignee:
            hjkim Hyoungjun Kim
            Reporter:
            hyunsik Hyunsik Choi
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development