Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-6539

Track metrics at a keyspace level as well as column family level

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Fixed
    • 1.2.17, 2.0.9, 2.1 rc2
    • None

    Description

      It would be useful to be able to see aggregated metrics (write/read count/latency) at a keyspace level as well as at the individual column family level.

      Attachments

        1. 6539-1.2.txt
          8 kB
          Brandon Williams
        2. 6539-2.0.txt
          10 kB
          Brandon Williams

        Issue Links

          Activity

            nickmbailey Nick Bailey added a comment -

            To be a bit clearer, this is more useful for data models and clusters where there are a very large amount of column families per keyspace (hundreds or thousands). Tracking only individual column families can be burdensome at that level.

            nickmbailey Nick Bailey added a comment - To be a bit clearer, this is more useful for data models and clusters where there are a very large amount of column families per keyspace (hundreds or thousands). Tracking only individual column families can be burdensome at that level.
            jbellis Jonathan Ellis added a comment -

            I'm not sure why having C* do the aggregation is better than having a monitoring service do it.

            jbellis Jonathan Ellis added a comment - I'm not sure why having C* do the aggregation is better than having a monitoring service do it.
            nickmbailey Nick Bailey added a comment -

            It's just slightly easier to do it in C*. The metrics library does nice things like automatically expose the metric data in multiple ways. And like I said, in the case of huge numbers of column families (thousands and above), it's less burdensome to do the aggregation in C* than on the client.

            nickmbailey Nick Bailey added a comment - It's just slightly easier to do it in C*. The metrics library does nice things like automatically expose the metric data in multiple ways. And like I said, in the case of huge numbers of column families (thousands and above), it's less burdensome to do the aggregation in C* than on the client.

            Initial patches against 1.2 and 2.0. I couldn't find a way to customize most of the 'fancy' metrics classes like Histogram and Counter, so I made them gauges and either summed or averaged as appropriate. I'm not sure what to do about the things using LatencyMetrics, so I bailed on that for now.

            brandon.williams Brandon Williams added a comment - Initial patches against 1.2 and 2.0. I couldn't find a way to customize most of the 'fancy' metrics classes like Histogram and Counter, so I made them gauges and either summed or averaged as appropriate. I'm not sure what to do about the things using LatencyMetrics, so I bailed on that for now.

            I also did not reach into internals for anything purposefully, but instead built on the cf metrics in an effort to hopefully keep future maintenance localized there.

            brandon.williams Brandon Williams added a comment - I also did not reach into internals for anything purposefully, but instead built on the cf metrics in an effort to hopefully keep future maintenance localized there.

            WDYT yukim?

            brandon.williams Brandon Williams added a comment - WDYT yukim ?
            yukim Yuki Morishita added a comment -

            You need to instantiate KeyspaceMetrics object at Keyspace creation and discard(release) when it closed. Otherwise metrics won't show up.

            I honestly don't get the point of trying to aggregate all metrics from CF. Total memtable/SSTable/BF sizes are fine, but I don't think others like max row size/latency/BF fp ratio are not so much.

            yukim Yuki Morishita added a comment - You need to instantiate KeyspaceMetrics object at Keyspace creation and discard(release) when it closed. Otherwise metrics won't show up. I honestly don't get the point of trying to aggregate all metrics from CF. Total memtable/SSTable/BF sizes are fine, but I don't think others like max row size/latency/BF fp ratio are not so much.

            I think you're right. I pared it down to anything that wasn't a simple sum or didn't make sense in the updated patch.

            brandon.williams Brandon Williams added a comment - I think you're right. I pared it down to anything that wasn't a simple sum or didn't make sense in the updated patch.
            yukim Yuki Morishita added a comment -

            +1

            yukim Yuki Morishita added a comment - +1

            Committed.

            brandon.williams Brandon Williams added a comment - Committed.
            rfwagner@gmail.com Richard Wagner added a comment -

            I think it is actually quite useful to have ALL the ColumnFamily metrics aggregated up to the keyspace level (and then up to the global level per CASSANDRA-7273). Much in the way that we have a complete set of metrics available at the StorageProxy level - including latencies - that are across all keyspaces and column families, these metrics are quite useful in general at the storage node level. In our case, we have a downstream monitoring cluster that we send metrics to. To build a generic, multi-tenant monitoring solution we have to either 1) aggregate all the CF metrics and present a "global" set of metrics or 2) capture metrics for ALL column families for all tenants. 2) is prohibitively expensive for us. Especially considering the small incremental benefit we see in practice from having this information broken down to the specific CF level. Almost always, we can diagnose issues with a global CF view of metrics. But going with solution 1), it is important that we get access to all the metrics.

            The way I think of it, conceptually, you could have 3 complete and identical sets of metrics widgets instantiated: CF, Keyspace and Global. Every time something measurable happens, you adjust the corresponding metric widget at all 3 levels.

            rfwagner@gmail.com Richard Wagner added a comment - I think it is actually quite useful to have ALL the ColumnFamily metrics aggregated up to the keyspace level (and then up to the global level per CASSANDRA-7273 ). Much in the way that we have a complete set of metrics available at the StorageProxy level - including latencies - that are across all keyspaces and column families, these metrics are quite useful in general at the storage node level. In our case, we have a downstream monitoring cluster that we send metrics to. To build a generic, multi-tenant monitoring solution we have to either 1) aggregate all the CF metrics and present a "global" set of metrics or 2) capture metrics for ALL column families for all tenants. 2) is prohibitively expensive for us. Especially considering the small incremental benefit we see in practice from having this information broken down to the specific CF level. Almost always, we can diagnose issues with a global CF view of metrics. But going with solution 1), it is important that we get access to all the metrics. The way I think of it, conceptually, you could have 3 complete and identical sets of metrics widgets instantiated: CF, Keyspace and Global. Every time something measurable happens, you adjust the corresponding metric widget at all 3 levels.

            People

              brandon.williams Brandon Williams
              nickmbailey Nick Bailey
              Brandon Williams
              Yuki Morishita
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: