Details
Description
We're pretty heavy users of CQL3 and CQL3 collection types. Occasionally, some nodes of the cluster will become extremely sluggish and the cluster as a whole starts to become unresponsive, reads will time out, and nodes will drop mutation messages. This happens when nodes flush Memtables to disk (based on my tail of the system.log on each node).
I'm a curious guy, so I attached jvisualvm (v1.3.3) to the JVMs that were having this problem. These nodes are spending up to 98% of CPU in org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78). I will attach a thread dump.
Thi is causing us quite a headache, because we're unable to figure what would be causing this. We tried tuning several configuration settings (column cache size, row key cache size), but the cluster exhibits the same issues even with the default configuration (except for a modified num_tokens and listen_address).
Attachments
Attachments
Issue Links
- duplicates
-
CASSANDRA-5677 Performance improvements of RangeTombstones/IntervalTree
- Resolved