Debugging Cassandra performance problems in really large production environment with different workload involves lots of challenges. There are different types of problems caused by queries, but major problem is there is only a little information available from Cassandra to diagnose performance issues.Most of the times user/ops team is ready to take actions on queries provided Cassandra provides more details on different problems seen at server side.
There has been already lots of work done as part of
CASSANDRA-12403, large partition warning , tombstone warning but I think Cassandra needs to provide more concrete information, tunable parameters, tunable way to consume this info, etc. Hence this jira is to have some common way of detecting and logging different problems in Cassandra cluster which could have potential impact on Cassandra cluster performance.
Target for this effort would be to reduce burden on ops to handle Cassandra at large scale, as well as help beginners to quickly identify performance problems with the Cassandra.
Please visit this document which has details like what is currently available, motivation behind developing this common framework, architecture, samples, etc. https://docs.google.com/document/d/1D0HNjC3a7gnuKnR_iDXLI5mvn1zQxtV7tloMaLYIENE/edit?usp=sharing
Here is the patch with this feature:
Please review this doc/patch and provide your feedback.