Details
-
Improvement
-
Status: Open
-
Low
-
Resolution: Unresolved
Description
When running a production cluster one common operational issue is quantifying GC pauses caused by ongoing requests.
Since different queries return varying amount of data you can easily get your self into a situation where you Stop the world from a couple of bad actors in the system. Or more likely the aggregate garbage generated on a single node across all in flight requests causes a GC.
It would be very useful for operators to see how much garbage the system is using to handle in flight mutations and queries.
It would also be nice to have either a log of queries which generate the most garbage so operators can track this. Also a histogram.
Attachments
Attachments
Issue Links
- is related to
-
CASSANDRA-3017 add a Message size limit
- Resolved
- relates to
-
CASSANDRA-9318 Bound the number of in-flight requests at the coordinator
- Resolved