Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 3.0.0 rc2
    • None
    • None

    Description

      We need to add more metrics to help understand where time is spent in materialized view writes. We currently track the ratio of async base -> view mutations that fail.

      We should also add

      • The amount of time spent waiting for the partition lock (contention)
      • The amount of time spent reading data

      Any others?

      carlyeks jkni

      Attachments

        1. trunk-10323.txt
          14 kB
          Chris Lohfink
        2. trunk-10323-v2.txt
          23 kB
          Chris Lohfink

        Activity

          jkni Joel Knighton added a comment -

          I'd be interested to see counters for batchlog entries created/removed by a coordinator. Effectively, how often does batchlog replay kick in as a mechanism to work toward quorum writes?

          jkni Joel Knighton added a comment - I'd be interested to see counters for batchlog entries created/removed by a coordinator. Effectively, how often does batchlog replay kick in as a mechanism to work toward quorum writes?

          It's possible to do that quite efficiently now that CASSANDRA-9673 is in.

          Commented about it here.

          That's metrics. Batchlog however is not a mechanism for achieving quorum writes and shouldn't be seen as such.

          aleksey Aleksey Yeschenko added a comment - It's possible to do that quite efficiently now that CASSANDRA-9673 is in. Commented about it here . That's metrics. Batchlog however is not a mechanism for achieving quorum writes and shouldn't be seen as such.
          jkni Joel Knighton added a comment -

          Agreed re: paragraph 3. I'll watch that closer in the future.

          jkni Joel Knighton added a comment - Agreed re: paragraph 3. I'll watch that closer in the future.

          Interested, cnlwsu?

          brandon.williams Brandon Williams added a comment - Interested, cnlwsu ?

          rcoli raised the good point that a very helpful metric for operators would be to track MV lag, because mutations are applied to the view from the base table asynchronously.

          philipthompson Philip Thompson added a comment - rcoli raised the good point that a very helpful metric for operators would be to track MV lag, because mutations are applied to the view from the base table asynchronously.
          cnlwsu Chris Lohfink added a comment -

          Ill give it a try

          cnlwsu Chris Lohfink added a comment - Ill give it a try
          rcoli Robert Coli added a comment -

          Robert Coli raised the good point that a very helpful metric for operators would be to track MV lag, because mutations are applied to the view from the base table asynchronously.

          Probably this ends up being expressed as "queue depth" as opposed to actual temporal lag, but "queue depth" provides useful visibility into "lag."

          rcoli Robert Coli added a comment - Robert Coli raised the good point that a very helpful metric for operators would be to track MV lag, because mutations are applied to the view from the base table asynchronously. Probably this ends up being expressed as "queue depth" as opposed to actual temporal lag, but "queue depth" provides useful visibility into "lag."
          cnlwsu Chris Lohfink added a comment -

          Is taking the attempted replicas - successful replicas work for that? Can make a gauge to make it easier

          cnlwsu Chris Lohfink added a comment - Is taking the attempted replicas - successful replicas work for that? Can make a gauge to make it easier
          rcoli Robert Coli added a comment -

          Is taking the attempted replicas - successful replicas work for that? Can make a gauge to make it easier

          That's a measure of how many retries you've ever had, not a measure of how many retries are in the queue now, isn't it? To me the latter is more valuable than the former.

          rcoli Robert Coli added a comment - Is taking the attempted replicas - successful replicas work for that? Can make a gauge to make it easier That's a measure of how many retries you've ever had, not a measure of how many retries are in the queue now, isn't it? To me the latter is more valuable than the former.
          cnlwsu Chris Lohfink added a comment -

          It's measure of how many mutations are outstanding to replicas of the MVs partitions after/during the base replicas mutation

          cnlwsu Chris Lohfink added a comment - It's measure of how many mutations are outstanding to replicas of the MVs partitions after/during the base replicas mutation
          cnlwsu Chris Lohfink added a comment -

          have it recording time from when base mutation is applied to memtable until when the CL.ONE is achieved on the async write to the MVs (I think). Added a "pending", but its global not per table. Also tracking per table how much time is spent on the local read and how much time it takes to acquire partition lock.

          cnlwsu Chris Lohfink added a comment - have it recording time from when base mutation is applied to memtable until when the CL.ONE is achieved on the async write to the MVs (I think). Added a "pending", but its global not per table. Also tracking per table how much time is spent on the local read and how much time it takes to acquire partition lock.
          carlyeks Carl Yeksigian added a comment -

          Overall +1. I pushed a branch with just a few nits; if you could take a look at them cnlwsu.

          • Removed the overload of mutateMV since we haven't hit a stable release with MV in yet.
          • Don't initialize the viewLockAcquire and viewRead metrics if this is a view. I was expecting those values to be updated when I was looking at the metrics; probably makes sense to just not have them at all for views instead of having the unused metrics.
          • Formatting miscellany
          carlyeks Carl Yeksigian added a comment - Overall +1. I pushed a branch with just a few nits; if you could take a look at them cnlwsu . Removed the overload of mutateMV since we haven't hit a stable release with MV in yet. Don't initialize the viewLockAcquire and viewRead metrics if this is a view. I was expecting those values to be updated when I was looking at the metrics; probably makes sense to just not have them at all for views instead of having the unused metrics. Formatting miscellany
          cnlwsu Chris Lohfink added a comment -

          Changes look good to me

          cnlwsu Chris Lohfink added a comment - Changes look good to me

          Committed, thanks.

          slebresne Sylvain Lebresne added a comment - Committed, thanks.

          People

            cnlwsu Chris Lohfink
            tjake T Jake Luciani
            Chris Lohfink
            Carl Yeksigian
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: