Cassandra
  1. Cassandra
  2. CASSANDRA-4009

Increase usage of Metrics and flesh out o.a.c.metrics

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None

      Description

      With CASSANDRA-3671 we have begun using the Metrics packages to expose stats in a new JMX structure, intended to be more user-friendly (for example, you don't need to know what a StorageProxy is or does.) This ticket serves as a parent for subtasks to finish fleshing out the rest of the enhanced metrics.

      1. 4009.txt
        97 kB
        Yuki Morishita
      2. 4009.txt
        125 kB
        Yuki Morishita
      3. 4009-v2.txt
        140 kB
        Yuki Morishita

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Cassandra #1984 (See https://builds.apache.org/job/Cassandra/1984/)
          New metrics; patch by yukim reviewed by brandonwilliams for CASSANDRA-4009 (Revision 69cedbfcaad18f40dcf97648c8adf129614054db)

          Result = ABORTED
          yukim :
          Files :

          • src/java/org/apache/cassandra/db/ColumnFamilyStore.java
          • src/java/org/apache/cassandra/metrics/StreamingMetrics.java
          • src/java/org/apache/cassandra/metrics/DroppedMessageMetrics.java
          • src/java/org/apache/cassandra/service/StorageProxyMBean.java
          • src/java/org/apache/cassandra/scheduler/WeightedQueueMBean.java
          • src/java/org/apache/cassandra/metrics/ThreadPoolMetrics.java
          • src/java/org/apache/cassandra/metrics/CommitLogMetrics.java
          • src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java
          • src/java/org/apache/cassandra/streaming/IncomingStreamReader.java
          • src/java/org/apache/cassandra/service/CacheServiceMBean.java
          • src/java/org/apache/cassandra/metrics/ClientRequestMetrics.java
          • CHANGES.txt
          • src/java/org/apache/cassandra/scheduler/WeightedQueue.java
          • src/java/org/apache/cassandra/concurrent/JMXEnabledThreadPoolExecutor.java
          • src/java/org/apache/cassandra/db/compaction/CompactionManagerMBean.java
          • src/java/org/apache/cassandra/streaming/FileStreamTask.java
          • src/java/org/apache/cassandra/concurrent/IExecutorMBean.java
          • src/java/org/apache/cassandra/cache/AutoSavingCache.java
          • src/java/org/apache/cassandra/metrics/CacheMetrics.java
          • src/java/org/apache/cassandra/net/MessagingService.java
          • src/java/org/apache/cassandra/db/compaction/CompactionManager.java
          • src/java/org/apache/cassandra/metrics/LatencyMetrics.java
          • src/java/org/apache/cassandra/metrics/MetricNameFactory.java
          • src/java/org/apache/cassandra/concurrent/JMXEnabledThreadPoolExecutorMBean.java
          • src/java/org/apache/cassandra/service/StorageService.java
          • src/java/org/apache/cassandra/service/CacheService.java
          • src/java/org/apache/cassandra/service/StorageProxy.java
          • src/java/org/apache/cassandra/db/commitlog/CommitLog.java
          • src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.java
          • src/java/org/apache/cassandra/db/DataTracker.java
          • src/java/org/apache/cassandra/metrics/CompactionMetrics.java
          • src/java/org/apache/cassandra/metrics/StorageMetrics.java
          • src/java/org/apache/cassandra/db/ColumnFamilyStoreMBean.java
          • src/java/org/apache/cassandra/service/StorageServiceMBean.java
          • src/java/org/apache/cassandra/db/commitlog/CommitLogMBean.java
          • src/java/org/apache/cassandra/cache/InstrumentingCache.java
          • src/java/org/apache/cassandra/metrics/ConnectionMetrics.java
          • src/java/org/apache/cassandra/streaming/compress/CompressedFileStreamTask.java
          Show
          Hudson added a comment - Integrated in Cassandra #1984 (See https://builds.apache.org/job/Cassandra/1984/ ) New metrics; patch by yukim reviewed by brandonwilliams for CASSANDRA-4009 (Revision 69cedbfcaad18f40dcf97648c8adf129614054db) Result = ABORTED yukim : Files : src/java/org/apache/cassandra/db/ColumnFamilyStore.java src/java/org/apache/cassandra/metrics/StreamingMetrics.java src/java/org/apache/cassandra/metrics/DroppedMessageMetrics.java src/java/org/apache/cassandra/service/StorageProxyMBean.java src/java/org/apache/cassandra/scheduler/WeightedQueueMBean.java src/java/org/apache/cassandra/metrics/ThreadPoolMetrics.java src/java/org/apache/cassandra/metrics/CommitLogMetrics.java src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java src/java/org/apache/cassandra/streaming/IncomingStreamReader.java src/java/org/apache/cassandra/service/CacheServiceMBean.java src/java/org/apache/cassandra/metrics/ClientRequestMetrics.java CHANGES.txt src/java/org/apache/cassandra/scheduler/WeightedQueue.java src/java/org/apache/cassandra/concurrent/JMXEnabledThreadPoolExecutor.java src/java/org/apache/cassandra/db/compaction/CompactionManagerMBean.java src/java/org/apache/cassandra/streaming/FileStreamTask.java src/java/org/apache/cassandra/concurrent/IExecutorMBean.java src/java/org/apache/cassandra/cache/AutoSavingCache.java src/java/org/apache/cassandra/metrics/CacheMetrics.java src/java/org/apache/cassandra/net/MessagingService.java src/java/org/apache/cassandra/db/compaction/CompactionManager.java src/java/org/apache/cassandra/metrics/LatencyMetrics.java src/java/org/apache/cassandra/metrics/MetricNameFactory.java src/java/org/apache/cassandra/concurrent/JMXEnabledThreadPoolExecutorMBean.java src/java/org/apache/cassandra/service/StorageService.java src/java/org/apache/cassandra/service/CacheService.java src/java/org/apache/cassandra/service/StorageProxy.java src/java/org/apache/cassandra/db/commitlog/CommitLog.java src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.java src/java/org/apache/cassandra/db/DataTracker.java src/java/org/apache/cassandra/metrics/CompactionMetrics.java src/java/org/apache/cassandra/metrics/StorageMetrics.java src/java/org/apache/cassandra/db/ColumnFamilyStoreMBean.java src/java/org/apache/cassandra/service/StorageServiceMBean.java src/java/org/apache/cassandra/db/commitlog/CommitLogMBean.java src/java/org/apache/cassandra/cache/InstrumentingCache.java src/java/org/apache/cassandra/metrics/ConnectionMetrics.java src/java/org/apache/cassandra/streaming/compress/CompressedFileStreamTask.java
          Hide
          Yuki Morishita added a comment -

          Committed. Thanks for review.

          Show
          Yuki Morishita added a comment - Committed. Thanks for review.
          Hide
          Brandon Williams added a comment -

          +1

          Show
          Brandon Williams added a comment - +1
          Hide
          Yuki Morishita added a comment -

          Rebased version attached.

          Show
          Yuki Morishita added a comment - Rebased version attached.
          Hide
          Brandon Williams added a comment -

          Can you rebase? Sorry I didn't quite get to this in time.

          Show
          Brandon Williams added a comment - Can you rebase? Sorry I didn't quite get to this in time.
          Hide
          Yuki Morishita added a comment -

          V2 attached.
          github: https://github.com/yukim/cassandra/tree/4009-2
          spec: https://gist.github.com/3013075

          Major changes:

          • "Recent" metrics are removed from new version and replaced with Meter type which provides 1-, 5-, 15-min average. (almost all. recent bloomfilter metrics still remain since I could not figure out clean way to access metrics from SSTableReader object.)
          • Use Timer, RateGauge, Histogram, where possible.
          • Add Streaming metrics.
          Show
          Yuki Morishita added a comment - V2 attached. github: https://github.com/yukim/cassandra/tree/4009-2 spec: https://gist.github.com/3013075 Major changes: "Recent" metrics are removed from new version and replaced with Meter type which provides 1-, 5-, 15-min average. (almost all. recent bloomfilter metrics still remain since I could not figure out clean way to access metrics from SSTableReader object.) Use Timer, RateGauge, Histogram, where possible. Add Streaming metrics.
          Hide
          Yuki Morishita added a comment -

          OK, I'll do it.

          Show
          Yuki Morishita added a comment - OK, I'll do it.
          Hide
          Brandon Williams added a comment -

          If it's not too expensive (and I imagine it's not) that sounds reasonable.

          Show
          Brandon Williams added a comment - If it's not too expensive (and I imagine it's not) that sounds reasonable.
          Hide
          Nick Bailey added a comment -

          I would imagine it would be better to have the new metrics take full advantage of all of the stuff Metrics provides right off the bat. Having to count a few metrics twice will be slightly annoying i guess until 1.3. At that point though we can remove the older deprecated metrics right?

          Show
          Nick Bailey added a comment - I would imagine it would be better to have the new metrics take full advantage of all of the stuff Metrics provides right off the bat. Having to count a few metrics twice will be slightly annoying i guess until 1.3. At that point though we can remove the older deprecated metrics right?
          Hide
          Yuki Morishita added a comment -

          If we can implement current metrics without breaking interface compatibility, it is possible to use more types from library. For example, the reason I didn't use Histogram is that it is impossible to implement current "recent" metrics whose value is cleared for every access. Some hit ratio have "recent" ones too.
          One possible way is we can provide incompatible new metrics using native Metrics library types while preserving current metrics by measuring twice for some metrics, but we will not providing "recent" metrics for new system.

          Show
          Yuki Morishita added a comment - If we can implement current metrics without breaking interface compatibility, it is possible to use more types from library. For example, the reason I didn't use Histogram is that it is impossible to implement current "recent" metrics whose value is cleared for every access. Some hit ratio have "recent" ones too. One possible way is we can provide incompatible new metrics using native Metrics library types while preserving current metrics by measuring twice for some metrics, but we will not providing "recent" metrics for new system.
          Hide
          Nick Bailey added a comment -

          I wonder if we couldn't take advantage of some of the built in stuff in Metrics. For example the built in histograms rather than our custom built ones. Or perhaps the RatioGauge type for some of the hit rate stuff.

          Show
          Nick Bailey added a comment - I wonder if we couldn't take advantage of some of the built in stuff in Metrics. For example the built in histograms rather than our custom built ones. Or perhaps the RatioGauge type for some of the hit rate stuff.
          Hide
          Yuki Morishita added a comment -

          Attaching patch for review.

          • Removed EndpointMetrics because I felt those are not metrics. You can get equivalent info from FailureDetectorMBean's AllEndpointStates.
          • Some metrics now are Counters.
          • Deprecated MBeans' attributes (nodetool still uses deprecated interface. I want to do that in separate ticket).

          on github: https://github.com/yukim/cassandra/tree/4009

          Also updated doc: https://gist.github.com/3013075

          Show
          Yuki Morishita added a comment - Attaching patch for review. Removed EndpointMetrics because I felt those are not metrics. You can get equivalent info from FailureDetectorMBean's AllEndpointStates. Some metrics now are Counters. Deprecated MBeans' attributes (nodetool still uses deprecated interface. I want to do that in separate ticket). on github: https://github.com/yukim/cassandra/tree/4009 Also updated doc: https://gist.github.com/3013075
          Hide
          Chris Burroughs added a comment -

          > We're using Metrics with Graphite which is quite nice. Any chance to add an option for reporters?

          I think this is totally the right thing to do and allows C* to behave just like every other metrics instrumented application. I think that's a separate question from "what are the right metrics to have" though so I created CASSANDRA-4430

          Show
          Chris Burroughs added a comment - > We're using Metrics with Graphite which is quite nice. Any chance to add an option for reporters? I think this is totally the right thing to do and allows C* to behave just like every other metrics instrumented application. I think that's a separate question from "what are the right metrics to have" though so I created CASSANDRA-4430
          Hide
          Nick Bailey added a comment -

          I just glanced at this briefly, but I'll note that a few of the new metrics look like they could be Counters instead of gauges. Don't really know if that has any actual effect on performance or anything like that though.

          Show
          Nick Bailey added a comment - I just glanced at this briefly, but I'll note that a few of the new metrics look like they could be Counters instead of gauges. Don't really know if that has any actual effect on performance or anything like that though.
          Hide
          Jonathan Ellis added a comment -

          If you mean, have C* report directly to a collector, I'd rather keep the pull model

          Show
          Jonathan Ellis added a comment - If you mean, have C* report directly to a collector, I'd rather keep the pull model
          Hide
          Daniel Doubleday added a comment -

          We're using Metrics with Graphite which is quite nice. Any chance to add an option for reporters?

          Show
          Daniel Doubleday added a comment - We're using Metrics with Graphite which is quite nice. Any chance to add an option for reporters?
          Hide
          Yuki Morishita added a comment -

          My first attempt attached. Basically, I extracted what it seems like 'Metrics' from current MBeans and re-organized them under o.a.c.metrics package. I uploaded list of metrics here(still wip, tho): https://gist.github.com/3013075

          Patch adds new metric classes, but I still did not completely replace old ones, nor mark as deprecated. I will do this if new metric classes seem OK.

          What bothers me right now is that I extracted endpoint states from FailureDetector#getAllEndpointStates to EndpointStateMetrics, but I don't think we are going to need it.

          If anything seems missing/not needed, please let me know.

          Show
          Yuki Morishita added a comment - My first attempt attached. Basically, I extracted what it seems like 'Metrics' from current MBeans and re-organized them under o.a.c.metrics package. I uploaded list of metrics here(still wip, tho): https://gist.github.com/3013075 Patch adds new metric classes, but I still did not completely replace old ones, nor mark as deprecated. I will do this if new metric classes seem OK. What bothers me right now is that I extracted endpoint states from FailureDetector#getAllEndpointStates to EndpointStateMetrics, but I don't think we are going to need it. If anything seems missing/not needed, please let me know.
          Hide
          Brandon Williams added a comment -

          This sounds good, but let's be careful about making things clear for the average user, for instance 'LatencyMetrics' seems a bit vague and nondescript.

          Show
          Brandon Williams added a comment - This sounds good, but let's be careful about making things clear for the average user, for instance 'LatencyMetrics' seems a bit vague and nondescript.
          Hide
          Yuki Morishita added a comment -

          Here is my current plan:

          • Separate out metrics from current MBeans and move them to o.a.c.metrics
            Currently most of MBeans have metrics, configurations and operations in the same place. Our first step is to pull out metrics and reconstruct them into o.a.c.metrics using Codahale Metrics. Current metrics interfaces will be marked as deprecated but not removed. Configurations(properties like minCompactionThreshold, which you can update through JMX) and operations stay as they are.
          • Migrate tools to use new o.a.c.metrics

          And some versions later,

          • Remove deprecated interface

          o.a.c.metrics will have following classes.

          • CacheMetrics
            Cache metrics for each key/row cache. Metrics from CacheServiceMBean go here.
          • ClientRequestMetrics (from 1.1.0)
            Port metrics from StorageProxy to this metrics.
          • ColumnFamilyMetrics
            Metrics for each column family. Metrics from ColumnFamilyStoreMBean go here.
          • CommitLogMetrics
            Metrics from CommitLogMBean go here.
          • CompactionMetrics
            Metrics from CompactionManagerMBean go here.
          • LatencyMetrics
            Replacement for current LatencyTracker. Used by other metrics.
          • MessagingMetrics
            Metrics from MessagingServiceMBean go here.
          • ThreadPoolMetrics
            Metrics from JMXEnabledThreadPoolExecutorMBean go here.
          • FailureDetectorMetrics, StreamingMetrics, DynamicEndpointSnitchMetrics
            and so on...

          Note that Codahale Metrics has own Histogram object but I'm thinking to stick with current EstimatedHistogram since it's hard to implement recent histogram (like recentSSTablesPerReadHistogram) which you have to reset histogram after reading its values.

          Show
          Yuki Morishita added a comment - Here is my current plan: Separate out metrics from current MBeans and move them to o.a.c.metrics Currently most of MBeans have metrics, configurations and operations in the same place. Our first step is to pull out metrics and reconstruct them into o.a.c.metrics using Codahale Metrics. Current metrics interfaces will be marked as deprecated but not removed. Configurations(properties like minCompactionThreshold, which you can update through JMX) and operations stay as they are. Migrate tools to use new o.a.c.metrics And some versions later, Remove deprecated interface o.a.c.metrics will have following classes. CacheMetrics Cache metrics for each key/row cache. Metrics from CacheServiceMBean go here. ClientRequestMetrics (from 1.1.0) Port metrics from StorageProxy to this metrics. ColumnFamilyMetrics Metrics for each column family. Metrics from ColumnFamilyStoreMBean go here. CommitLogMetrics Metrics from CommitLogMBean go here. CompactionMetrics Metrics from CompactionManagerMBean go here. LatencyMetrics Replacement for current LatencyTracker. Used by other metrics. MessagingMetrics Metrics from MessagingServiceMBean go here. ThreadPoolMetrics Metrics from JMXEnabledThreadPoolExecutorMBean go here. FailureDetectorMetrics, StreamingMetrics, DynamicEndpointSnitchMetrics and so on... Note that Codahale Metrics has own Histogram object but I'm thinking to stick with current EstimatedHistogram since it's hard to implement recent histogram (like recentSSTablesPerReadHistogram) which you have to reset histogram after reading its values.
          Hide
          Jonathan Ellis added a comment -

          I'd like to flesh out the plan here before kicking into implementation.

          Show
          Jonathan Ellis added a comment - I'd like to flesh out the plan here before kicking into implementation.

            People

            • Assignee:
              Yuki Morishita
              Reporter:
              Brandon Williams
              Reviewer:
              Brandon Williams
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development