Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This is a proposal to provide a new metrics reporting API based on Coda Hale's metrics library (AKA Dropwizard/Yammer metrics).
Background
In a discussion on the dev@ mailing list a number of community and PMC members recommended replacing Storm’s metrics system with a new API as opposed to enhancing the existing metrics system. Some of the objections to the existing metrics API include:
- Metrics are reported as an untyped Java object, making it very difficult to reason about how to report it (e.g. is it a gauge, a counter, etc.?)
- It is difficult to determine if metrics coming into the consumer are pre-aggregated or not.
- Storm’s metrics collection occurs through a specialized bolt, which in addition to potentially affecting system performance, complicates certain types of aggregation when the parallelism of that bolt is greater than one.
In the discussion on the developer mailing list, there is growing consensus for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics library. This approach has the following benefits:
- Coda Hale’s metrics library is very stable, performant, well thought out, and widely adopted among open source projects (e.g. Kafka).
- The metrics library provides many existing metric types: Meters, Gauges, Counters, Histograms, and more.
- The library has a pluggable “reporter” API for publishing metrics to various systems, with existing implementations for: JMX, console, CSV, SLF4J, Graphite, Ganglia.
- Reporters are straightforward to implement, and can be reused by any project that uses the metrics library (i.e. would have broader application outside of Storm)
As noted earlier, the metrics library supports pluggable reporters for sending metrics data to other systems, and implementing a reporter is fairly straightforward (an example reporter implementation can be found here). For example if someone develops a reporter based on Coda Hale’s metrics, it could not only be used for pushing Storm metrics, but also for any system that used the metrics library, such as Kafka.
Scope of Effort
The effort to implement a new metrics API for Storm can be broken down into the following development areas:
- Implement API for Storms internal worker metrics: latencies, queue sizes, capacity, etc.
- Implement API for user defined, topology-specific metrics (exposed via the org.apache.storm.task.TopologyContext class)
- Implement API for storm daemons: nimbus, supervisor, etc.
Relationship to Existing Metrics
This would be a new API that would not affect the existing metrics API. Upon completion, the old metrics API would presumably be deprecated, but kept in place for backward compatibility.
Internally the current metrics API uses Storm bolts for the reporting mechanism. The proposed metrics API would not depend on any of Storm's messaging capabilities and instead use the metrics library's built-in reporter mechanism . This would allow users to use existing Reporter implementations which are not Storm-specific, and would simplify the process of collecting metrics. Compared to Storm's IMetricCollector interface, implementing a reporter for the metrics library is much more straightforward (an example can be found here .
The new metrics capability would not use or affect the ZooKeeper-based metrics used by Storm UI.
Relationship to JStorm Metrics
[TBD]
Target Branches
[TBD]
Performance Implications
[TBD]
Metrics Namespaces
[TBD]
Metrics Collected
Worker
Namespace | Metric Type | Description |
---|
Nimbus
Namespace | Metric Type | Description |
---|
Supervisor
Namespace | Metric Type | Description |
---|
User-Defined Metrics
[TBD]
Attachments
Issue Links
- contains
-
STORM-2160 Expose interface to MetricRegistry via TopologyContext
- Resolved
-
STORM-2164 Create simple generic plugin system to register codahale reporters
- Resolved
-
STORM-2156 Add RocksDB instance in Nimbus and write heartbeat based metrics to it
- Closed
-
STORM-2159 Codahale-ize Executor and Worker builtin-in stats
- Closed
-
STORM-2165 Implement default worker metrics reporter
- Closed
-
STORM-2167 Decide on a query language the UI can use to query metrics
- Closed
-
STORM-2168 Deprecate metrics going through Zookeeper from workers
- Closed
-
STORM-2169 Define Naming Convention for Metric Namespaces
- Closed
- duplicates
-
STORM-1329 Evaluate/Port JStorm metrics system
- Closed
- is cloned by
-
STORM-2909 New Metrics Reporting API - for 2.0.0
- Resolved
- relates to
-
STORM-2162 Expand set of codahale Metrics for Supervisor
- Closed
-
STORM-2163 Expand set of codahale Metrics for Nimbus
- Closed
-
STORM-2166 Expand set of codahale metrics for UI
- Closed
- links to
ptgoetz,
Just a small correction here: "The proposed metrics API would depend on any of Storm's messaging capabilities...": I think you mean "would NOT depend".
I am not sure codahale answers one part that I am unclear about, and that is how workers will send their metrics over. If they interact with codahale directly, or whether there is some collection/aggregation in the middle.
I'll read more on how Kafka uses it to see if that helps.