Details
-
Improvement
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Code Clarity
-
Low Hanging Fruit
-
All
-
None
Description
CASSANDRA-15582 involves quality around metrics and it was mentioned that reviewing and improving documentation around metrics would fall into that scope. Please consider some of this analysis in determining what improvements to make here:
Please see this spreadsheet that itemizes almost all of cassandra's metrics and whether they are documented or not (and other notes). That spreadsheet is "almost all" because there are some metrics that don't seem to initialize as part of Cassandra startup (i was able to trigger some to initialize, but all were not immediately obvious). The missing metrics seem to be related to the following:
- ThreadPool metrics - only some initialize at startup the list of which follow below
- Streaming Metrics
- HintedHandoff Metrics
- HintsService Metrics
Here are the ThreadPool scopes that get listed:
AntiEntropyStage CacheCleanupExecutor CompactionExecutor GossipStage HintsDispatcher MemtableFlushWriter MemtablePostFlush MemtableReclaimMemory MigrationStage MutationStage Native-Transport-Requests PendingRangeCalculator PerDiskMemtableFlushWriter_0 ReadStage Repair-Task RequestResponseStage Sampler SecondaryIndexManagement ValidationExecutor ViewBuildExecutor
I noticed that Keyspace Metrics have this note: "Most of these metrics are the same as the Table Metrics above, only they are aggregated at the Keyspace level." I think I've isolated those metrics on table that are not on keyspace to specifically be:
BloomFilterFalsePositives BloomFilterFalseRatio BytesAnticompacted BytesFlushed BytesMutatedAnticompaction BytesPendingRepair BytesRepaired BytesUnrepaired CompactionBytesWritten CompressionRatio CoordinatorReadLatency CoordinatorScanLatency CoordinatorWriteLatency EstimatedColumnCountHistogram EstimatedPartitionCount EstimatedPartitionSizeHistogram KeyCacheHitRate LiveSSTableCount MaxPartitionSize MeanPartitionSize MinPartitionSize MutatedAnticompactionGauge PercentRepaired RowCacheHitOutOfRange RowCacheHit RowCacheMiss SpeculativeSampleLatencyNanos SyncTime WaitingOnFreeMemtableSpace DroppedMutations
Someone with greater knowledge of this area might consider it worth the effort to see if any of these metrics should be aggregated to the keyspace level in case they were inadvertently missed. In any case, perhaps the documentation could easily now reflect which metric names could be expected on Keyspace.
The DroppedMessage metrics have a much larger body of scopes than just what were documented:
ASYMMETRIC_SYNC_REQ BATCH_REMOVE_REQ BATCH_REMOVE_RSP BATCH_STORE_REQ BATCH_STORE_RSP CLEANUP_MSG COUNTER_MUTATION_REQ COUNTER_MUTATION_RSP ECHO_REQ ECHO_RSP FAILED_SESSION_MSG FAILURE_RSP FINALIZE_COMMIT_MSG FINALIZE_PROMISE_MSG FINALIZE_PROPOSE_MSG GOSSIP_DIGEST_ACK GOSSIP_DIGEST_ACK2 GOSSIP_DIGEST_SYN GOSSIP_SHUTDOWN HINT_REQ HINT_RSP INTERNAL_RSP MUTATION_REQ MUTATION_RSP PAXOS_COMMIT_REQ PAXOS_COMMIT_RSP PAXOS_PREPARE_REQ PAXOS_PREPARE_RSP PAXOS_PROPOSE_REQ PAXOS_PROPOSE_RSP PING_REQ PING_RSP PREPARE_CONSISTENT_REQ PREPARE_CONSISTENT_RSP PREPARE_MSG RANGE_REQ RANGE_RSP READ_REPAIR_REQ READ_REPAIR_RSP READ_REQ READ_RSP REPAIR_RSP REPLICATION_DONE_REQ REPLICATION_DONE_RSP REQUEST_RSP SCHEMA_PULL_REQ SCHEMA_PULL_RSP SCHEMA_PUSH_REQ SCHEMA_PUSH_RSP SCHEMA_VERSION_REQ SCHEMA_VERSION_RSP SNAPSHOT_MSG SNAPSHOT_REQ SNAPSHOT_RSP STATUS_REQ STATUS_RSP SYNC_REQ SYNC_RSP TRUNCATE_REQ TRUNCATE_RSP VALIDATION_REQ VALIDATION_RSP _SAMPLE _TEST_1 _TEST_2 _TRACE
I suppose I may yet be missing some metrics as my knowledge of what's available is limited to what I can get from JMX after cassandra initialization (and some initial starting commands) and what's int he documentation. If something is present that is missing from both then I won't know it's there. Anyway, perhaps this issue can help build some discussion around the improvements that might be made given the analysis that has been provided so far.
Attachments
Issue Links
- depends upon
-
CASSANDRA-15909 Make Table/Keyspace Metric Names Consistent With Each Other
- Resolved
- is a child of
-
CASSANDRA-15582 4.0 quality testing: metrics
- In Progress
- is related to
-
CASSANDRA-16261 Prevent unbounded number of flushing tasks
- Resolved