Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15821

Metrics Documentation Enhancements

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Normal
    • Resolution: Unresolved
    • Fix Version/s: 4.0.x
    • Component/s: Documentation/Website
    • Labels:
      None
    • Change Category:
      Code Clarity
    • Complexity:
      Low Hanging Fruit
    • Platform:
      All
    • Impacts:
      None

      Description

      CASSANDRA-15582 involves quality around metrics and it was mentioned that reviewing and improving documentation around metrics would fall into that scope. Please consider some of this analysis in determining what improvements to make here:

      Please see this spreadsheet that itemizes almost all of cassandra's metrics and whether they are documented or not (and other notes). That spreadsheet is "almost all" because there are some metrics that don't seem to initialize as part of Cassandra startup (i was able to trigger some to initialize, but all were not immediately obvious). The missing metrics seem to be related to the following:

      • ThreadPool metrics - only some initialize at startup the list of which follow below
      • Streaming Metrics
      • HintedHandoff Metrics
      • HintsService Metrics

      Here are the ThreadPool scopes that get listed:

      AntiEntropyStage
      CacheCleanupExecutor
      CompactionExecutor
      GossipStage
      HintsDispatcher
      MemtableFlushWriter
      MemtablePostFlush
      MemtableReclaimMemory
      MigrationStage
      MutationStage
      Native-Transport-Requests
      PendingRangeCalculator
      PerDiskMemtableFlushWriter_0
      ReadStage
      Repair-Task
      RequestResponseStage
      Sampler
      SecondaryIndexManagement
      ValidationExecutor
      ViewBuildExecutor
      

      I noticed that Keyspace Metrics have this note: "Most of these metrics are the same as the Table Metrics above, only they are aggregated at the Keyspace level." I think I've isolated those metrics on table that are not on keyspace to specifically be:

      BloomFilterFalsePositives
      BloomFilterFalseRatio
      BytesAnticompacted
      BytesFlushed
      BytesMutatedAnticompaction
      BytesPendingRepair
      BytesRepaired
      BytesUnrepaired
      CompactionBytesWritten
      CompressionRatio
      CoordinatorReadLatency
      CoordinatorScanLatency
      CoordinatorWriteLatency
      EstimatedColumnCountHistogram
      EstimatedPartitionCount
      EstimatedPartitionSizeHistogram
      KeyCacheHitRate
      LiveSSTableCount
      MaxPartitionSize
      MeanPartitionSize
      MinPartitionSize
      MutatedAnticompactionGauge
      PercentRepaired
      RowCacheHitOutOfRange
      RowCacheHit
      RowCacheMiss
      SpeculativeSampleLatencyNanos
      SyncTime
      WaitingOnFreeMemtableSpace
      DroppedMutations
      

      Someone with greater knowledge of this area might consider it worth the effort to see if any of these metrics should be aggregated to the keyspace level in case they were inadvertently missed. In any case, perhaps the documentation could easily now reflect which metric names could be expected on Keyspace.

      The DroppedMessage metrics have a much larger body of scopes than just what were documented:

      ASYMMETRIC_SYNC_REQ
      BATCH_REMOVE_REQ
      BATCH_REMOVE_RSP
      BATCH_STORE_REQ
      BATCH_STORE_RSP
      CLEANUP_MSG
      COUNTER_MUTATION_REQ
      COUNTER_MUTATION_RSP
      ECHO_REQ
      ECHO_RSP
      FAILED_SESSION_MSG
      FAILURE_RSP
      FINALIZE_COMMIT_MSG
      FINALIZE_PROMISE_MSG
      FINALIZE_PROPOSE_MSG
      GOSSIP_DIGEST_ACK
      GOSSIP_DIGEST_ACK2
      GOSSIP_DIGEST_SYN
      GOSSIP_SHUTDOWN
      HINT_REQ
      HINT_RSP
      INTERNAL_RSP
      MUTATION_REQ
      MUTATION_RSP
      PAXOS_COMMIT_REQ
      PAXOS_COMMIT_RSP
      PAXOS_PREPARE_REQ
      PAXOS_PREPARE_RSP
      PAXOS_PROPOSE_REQ
      PAXOS_PROPOSE_RSP
      PING_REQ
      PING_RSP
      PREPARE_CONSISTENT_REQ
      PREPARE_CONSISTENT_RSP
      PREPARE_MSG
      RANGE_REQ
      RANGE_RSP
      READ_REPAIR_REQ
      READ_REPAIR_RSP
      READ_REQ
      READ_RSP
      REPAIR_RSP
      REPLICATION_DONE_REQ
      REPLICATION_DONE_RSP
      REQUEST_RSP
      SCHEMA_PULL_REQ
      SCHEMA_PULL_RSP
      SCHEMA_PUSH_REQ
      SCHEMA_PUSH_RSP
      SCHEMA_VERSION_REQ
      SCHEMA_VERSION_RSP
      SNAPSHOT_MSG
      SNAPSHOT_REQ
      SNAPSHOT_RSP
      STATUS_REQ
      STATUS_RSP
      SYNC_REQ
      SYNC_RSP
      TRUNCATE_REQ
      TRUNCATE_RSP
      VALIDATION_REQ
      VALIDATION_RSP
      _SAMPLE
      _TEST_1
      _TEST_2
      _TRACE
      

      I suppose I may yet be missing some metrics as my knowledge of what's available is limited to what I can get from JMX after cassandra initialization (and some initial starting commands) and what's int he documentation. If something is present that is missing from both then I won't know it's there. Anyway, perhaps this issue can help build some discussion around the improvements that might be made given the analysis that has been provided so far.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                spmallette Stephen Mallette
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: