[CASSANDRA-2058] Load spikes due to MessagingService-generated garbage collection - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 0.6.11, 0.7.1
Component/s: None
Labels:
None
Environment:

OpenJDK 64-Bit Server VM (build 1.6.0_0-b12, mixed mode)
Ubuntu 8.10
Linux pmc01 2.6.27-22-xen #1 SMP Fri Feb 20 23:58:13 UTC 2009 x86_64 GNU/Linux

Severity:
Normal

Description

(Filing as a placeholder bug as I gather information.)

At ~10p 24 Jan, I upgraded our 20-node cluster from 0.6.8->0.6.10, turned on the DES, and moved some CFs from one KS into another (drain whole cluster, take it down, move files, change schema, put it back up). Since then, I've had four storms whereby a node's load will shoot to 700+ (400% CPU on a 4-cpu machine) and become totally unresponsive. After a moment or two like that, its neighbour dies too, and the failure cascades around the ring. Unfortunately because of the high load I'm not able to get into the machine to pull a thread dump to see wtf it's doing as it happens.

I've also had an issue where a single node spikes up to high load, but recovers. This may or may not be the same issue from which the nodes don't recover as above, but both are new behaviour

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

graph b.png
26/Jan/11 03:30
90 kB
David King
graph a.png
26/Jan/11 03:30
86 kB
David King
cassandra.pmc14.log.bz2
26/Jan/11 03:38
1.71 MB
David King
cassandra.pmc01.log.bz2
26/Jan/11 01:29
662 kB
David King
2058-0.7-v3.txt
27/Jan/11 18:40
26 kB
Jonathan Ellis
2058-0.7-v2.txt
27/Jan/11 18:30
27 kB
Brandon Williams
2058-0.7.txt
27/Jan/11 04:06
22 kB
Jonathan Ellis
2058.txt
26/Jan/11 21:49
28 kB
Jonathan Ellis

Activity

People

Assignee:: Jonathan Ellis

Reporter:: David King

Authors:: Jonathan Ellis

Reviewers:: Brandon Williams

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 26/Jan/11 01:20

Updated:: 16/Apr/19 09:33

Resolved:: 15/Feb/11 19:48

Time Tracking

Estimated:

0.4h

Remaining:

0.4h

Logged:

Not Specified