[CASSANDRA-8285] Move all hints related tasks to hints private executor - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 2.0.12, 2.1.3
Component/s: None
Labels:
None
Environment:

Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT
Cassandra 2.0.11 + ruby-driver 1.0-beta

Severity:
Normal

Description

We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot.

Attached are :

OOME_node_system.log	The system.log of one Cassandra node that crashed
gc.log.gz	The GC log on the same node
heap-usage-after-gc.png	The heap occupancy evolution after every GC cycle
heap-usage-after-gc-zoom.png	A focus on when things start to go wrong

Workload :
Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test.

Symptoms :
In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs.

I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.11.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

system.log.gz
24/Nov/14 19:09
150 kB
Pierre Laporte
OOME_node_system.log
10/Nov/14 15:12
6.01 MB
Pierre Laporte
heap-usage-after-gc-zoom.png
10/Nov/14 15:12
119 kB
Pierre Laporte
heap-usage-after-gc.png
10/Nov/14 15:12
384 kB
Pierre Laporte
gc-1416849312.log.gz
24/Nov/14 19:09
2.23 MB
Pierre Laporte
gc.log.gz
10/Nov/14 15:12
5.15 MB
Pierre Laporte
8285-v3.txt
19/Dec/14 17:16
13 kB
Sam Tunnicliffe
8285-v2.txt
07/Dec/14 15:34
10 kB
Aleksey Yeschenko
8285.txt
03/Dec/14 01:49
9 kB
Aleksey Yeschenko

Issue Links

is broken by

CASSANDRA-6998 HintedHandoff - expired hints may block future hints deliveries

Resolved

is related to

CASSANDRA-8164 OOM due to slow memory meter

Resolved

Activity

People

Assignee:: Aleksey Yeschenko

Reporter:: Pierre Laporte

Authors:: Aleksey Yeschenko

Reviewers:: Sam Tunnicliffe

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 10/Nov/14 15:12

Updated:: 16/Apr/19 09:31

Resolved:: 19/Dec/14 18:29