[CASSANDRA-6539] Track metrics at a keyspace level as well as column family level - ASF JIRA

Details

Type: Improvement
Status: Resolved
Priority: Low
Resolution: Fixed
Fix Version/s: 1.2.17, 2.0.9, 2.1 rc2
Component/s: None
Labels:
- lhf

Description

It would be useful to be able to see aggregated metrics (write/read count/latency) at a keyspace level as well as at the individual column family level.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

6539-1.2.txt
12/Jun/14 16:54
8 kB
Brandon Williams
6539-2.0.txt
12/Jun/14 16:54
10 kB
Brandon Williams

Issue Links

is depended upon by

CASSANDRA-7273 expose global ColumnFamily metrics

Resolved

relates to

CASSANDRA-7273 expose global ColumnFamily metrics

Resolved

Activity

Ascending order - Click to sort in descending order

Nick Bailey added a comment - 02/Jan/14 19:21

To be a bit clearer, this is more useful for data models and clusters where there are a very large amount of column families per keyspace (hundreds or thousands). Tracking only individual column families can be burdensome at that level.

Nick Bailey added a comment - 02/Jan/14 19:21 To be a bit clearer, this is more useful for data models and clusters where there are a very large amount of column families per keyspace (hundreds or thousands). Tracking only individual column families can be burdensome at that level.

Jonathan Ellis added a comment - 14/Mar/14 04:42

I'm not sure why having C* do the aggregation is better than having a monitoring service do it.

Jonathan Ellis added a comment - 14/Mar/14 04:42 I'm not sure why having C* do the aggregation is better than having a monitoring service do it.

Nick Bailey added a comment - 14/Mar/14 17:47

It's just slightly easier to do it in C*. The metrics library does nice things like automatically expose the metric data in multiple ways. And like I said, in the case of huge numbers of column families (thousands and above), it's less burdensome to do the aggregation in C* than on the client.

Nick Bailey added a comment - 14/Mar/14 17:47 It's just slightly easier to do it in C*. The metrics library does nice things like automatically expose the metric data in multiple ways. And like I said, in the case of huge numbers of column families (thousands and above), it's less burdensome to do the aggregation in C* than on the client.

Brandon Williams added a comment - 30/May/14 23:19

Initial patches against 1.2 and 2.0. I couldn't find a way to customize most of the 'fancy' metrics classes like Histogram and Counter, so I made them gauges and either summed or averaged as appropriate. I'm not sure what to do about the things using LatencyMetrics, so I bailed on that for now.

Brandon Williams added a comment - 30/May/14 23:19 Initial patches against 1.2 and 2.0. I couldn't find a way to customize most of the 'fancy' metrics classes like Histogram and Counter, so I made them gauges and either summed or averaged as appropriate. I'm not sure what to do about the things using LatencyMetrics, so I bailed on that for now.

Brandon Williams added a comment - 30/May/14 23:21

I also did not reach into internals for anything purposefully, but instead built on the cf metrics in an effort to hopefully keep future maintenance localized there.

Brandon Williams added a comment - 30/May/14 23:21 I also did not reach into internals for anything purposefully, but instead built on the cf metrics in an effort to hopefully keep future maintenance localized there.

Brandon Williams added a comment - 09/Jun/14 21:23

WDYT yukim?

Brandon Williams added a comment - 09/Jun/14 21:23 WDYT yukim ?

Yuki Morishita added a comment - 10/Jun/14 17:35

You need to instantiate KeyspaceMetrics object at Keyspace creation and discard(release) when it closed. Otherwise metrics won't show up.

I honestly don't get the point of trying to aggregate all metrics from CF. Total memtable/SSTable/BF sizes are fine, but I don't think others like max row size/latency/BF fp ratio are not so much.

Yuki Morishita added a comment - 10/Jun/14 17:35 You need to instantiate KeyspaceMetrics object at Keyspace creation and discard(release) when it closed. Otherwise metrics won't show up. I honestly don't get the point of trying to aggregate all metrics from CF. Total memtable/SSTable/BF sizes are fine, but I don't think others like max row size/latency/BF fp ratio are not so much.

Brandon Williams added a comment - 12/Jun/14 00:28

I think you're right. I pared it down to anything that wasn't a simple sum or didn't make sense in the updated patch.

Brandon Williams added a comment - 12/Jun/14 00:28 I think you're right. I pared it down to anything that wasn't a simple sum or didn't make sense in the updated patch.

Yuki Morishita added a comment - 12/Jun/14 23:40

Yuki Morishita added a comment - 12/Jun/14 23:40 +1

Brandon Williams added a comment - 14/Jun/14 02:40

Committed.

Brandon Williams added a comment - 14/Jun/14 02:40 Committed.

Richard Wagner added a comment - 17/Jun/14 00:48

I think it is actually quite useful to have ALL the ColumnFamily metrics aggregated up to the keyspace level (and then up to the global level per ~~CASSANDRA-7273~~). Much in the way that we have a complete set of metrics available at the StorageProxy level - including latencies - that are across all keyspaces and column families, these metrics are quite useful in general at the storage node level. In our case, we have a downstream monitoring cluster that we send metrics to. To build a generic, multi-tenant monitoring solution we have to either 1) aggregate all the CF metrics and present a "global" set of metrics or 2) capture metrics for ALL column families for all tenants. 2) is prohibitively expensive for us. Especially considering the small incremental benefit we see in practice from having this information broken down to the specific CF level. Almost always, we can diagnose issues with a global CF view of metrics. But going with solution 1), it is important that we get access to all the metrics.

The way I think of it, conceptually, you could have 3 complete and identical sets of metrics widgets instantiated: CF, Keyspace and Global. Every time something measurable happens, you adjust the corresponding metric widget at all 3 levels.

Richard Wagner added a comment - 17/Jun/14 00:48 I think it is actually quite useful to have ALL the ColumnFamily metrics aggregated up to the keyspace level (and then up to the global level per CASSANDRA-7273 ). Much in the way that we have a complete set of metrics available at the StorageProxy level - including latencies - that are across all keyspaces and column families, these metrics are quite useful in general at the storage node level. In our case, we have a downstream monitoring cluster that we send metrics to. To build a generic, multi-tenant monitoring solution we have to either 1) aggregate all the CF metrics and present a "global" set of metrics or 2) capture metrics for ALL column families for all tenants. 2) is prohibitively expensive for us. Especially considering the small incremental benefit we see in practice from having this information broken down to the specific CF level. Almost always, we can diagnose issues with a global CF view of metrics. But going with solution 1), it is important that we get access to all the metrics. The way I think of it, conceptually, you could have 3 complete and identical sets of metrics widgets instantiated: CF, Keyspace and Global. Every time something measurable happens, you adjust the corresponding metric widget at all 3 levels.

People

Assignee:: Brandon Williams

Reporter:: Nick Bailey

Authors:: Brandon Williams

Reviewers:: Yuki Morishita

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/Jan/14 19:09

Updated:: 16/Apr/19 09:31

Resolved:: 14/Jun/14 02:40

Apache Cassandra

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates