Details
Description
We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot.
Attached are :
OOME_node_system.log | The system.log of one Cassandra node that crashed |
gc.log.gz | The GC log on the same node |
heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle |
heap-usage-after-gc-zoom.png | A focus on when things start to go wrong |
Workload :
Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test.
Symptoms :
In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs.
I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.11.
Attachments
Attachments
Issue Links
- is broken by
-
CASSANDRA-6998 HintedHandoff - expired hints may block future hints deliveries
- Resolved
- is related to
-
CASSANDRA-8164 OOM due to slow memory meter
- Resolved