[CASSANDRA-9798] Cassandra seems to have deadlocks during flush operations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 2.1.x
Component/s: None
Labels:
None
Environment:

Hide

4x HP Gen9 dl 360 servers
2x8 cpu each (Intel(R) Xeon E5-2667 v3 @ 3.20GHz)
6x900GB 10kRPM disk for data
1x900GB 10kRPM disk for commitlog
64GB ram
ETH: 10Gb/s
Red Hat Enterprise Linux Server release 6.6 (Santiago) 2.6.32-504.el6.x86_64
java build 1.8.0_45-b14 (openjdk) (tested on oracle java 8 too)

Show
4x HP Gen9 dl 360 servers 2x8 cpu each (Intel(R) Xeon E5-2667 v3 @ 3.20GHz) 6x900GB 10kRPM disk for data 1x900GB 10kRPM disk for commitlog 64GB ram ETH: 10Gb/s Red Hat Enterprise Linux Server release 6.6 (Santiago) 2.6.32-504.el6.x86_64 java build 1.8.0_45-b14 (openjdk) (tested on oracle java 8 too)

Severity:
Normal

Description

Hi,
We noticed some problem with dropped mutationstages. Usually on one random node there is a situation that:
MutationStage "active" is full, "pending" is increasing "completed" is stalled.
MemtableFlushWriter "active" 6, pending: 25 completed: stalled
MemtablePostFlush "active" is 1, pending 29 completed: stalled

after a some time (30s-10min) pending mutations are dropped and everything is working.
When it happened:
1. Cpu idle is ~95%
2. no gc long pauses or more activity.
3. memory usage 3.5GB form 8GB
4. only writes is processed by cassandra
5. when LOAD > 400GB/node problems appeared
6. cassandra 2.1.6

There is gap in logs:

INFO  08:47:01 Timed out replaying hints to /192.168.100.83; aborting (0 delivered)
INFO  08:47:01 Enqueuing flush of hints: 7870567 (0%) on-heap, 0 (0%) off-heap
INFO  08:47:30 Enqueuing flush of table1: 95301807 (4%) on-heap, 0 (0%) off-heap
INFO  08:47:31 Enqueuing flush of table1: 60462632 (3%) on-heap, 0 (0%) off-heap
INFO  08:47:31 Enqueuing flush of table2: 76973746 (4%) on-heap, 0 (0%) off-heap
INFO  08:47:31 Enqueuing flush of table1: 84290135 (4%) on-heap, 0 (0%) off-heap
INFO  08:47:32 Enqueuing flush of table3: 56926652 (3%) on-heap, 0 (0%) off-heap
INFO  08:47:32 Enqueuing flush of table1: 85124218 (4%) on-heap, 0 (0%) off-heap
INFO  08:47:33 Enqueuing flush of table2: 95663415 (4%) on-heap, 0 (0%) off-heap
INFO  08:47:58 CompactionManager                 2        39
INFO  08:47:58 Writing Memtable-table2@1767938721(13843064 serialized bytes, 162359 ops, 4%/0% of on/off-heap l
imit)
INFO  08:47:58 Writing Memtable-hints@1433125911(478703 serialized bytes, 424 ops, 0%/0% of on/off-heap limit)
INFO  08:47:58 Writing Memtable-table2@1318583275(11783615 serialized bytes, 137378 ops, 4%/0% of on/off-heap l
imit)
INFO  08:47:58 Enqueuing flush of compactions_in_progress: 969 (0%) on-heap, 0 (0%) off-heap
INFO  08:47:58 Writing Memtable-table1@541175113(17221327 serialized bytes, 180792 ops, 4%/0% of on/off-heap
 limit)
INFO  08:47:58 Writing Memtable-table1@1361154669(27138519 serialized bytes, 273472 ops, 6%/0% of on/off-hea
p limit)

INFO  08:48:03 2176 MUTATION messages dropped in last 5000ms

use case:
100% write - 100Mb/s, couples of CF ~10column each. max cell size 100B
CMS and G1GC tested - no difference

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

cassandra.2.1.8.log
20/Jul/15 11:47
9 kB
Łukasz Mrożkiewicz
cassandra.log
14/Jul/15 11:18
110 kB
Łukasz Mrożkiewicz
cassandra.yaml
20/Jul/15 12:29
37 kB
Łukasz Mrożkiewicz
cassandra.yaml
14/Jul/15 10:56
37 kB
Łukasz Mrożkiewicz
gc.log.0.current
14/Jul/15 10:56
219 kB
Łukasz Mrożkiewicz
stack.txt
20/Jul/15 12:41
1.07 MB
Łukasz Mrożkiewicz
topHbn1.txt
20/Jul/15 12:41
221 kB
Łukasz Mrożkiewicz

Activity

People

Assignee:: Unassigned

Reporter:: Łukasz Mrożkiewicz

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 14/Jul/15 10:56

Updated:: 16/Apr/19 09:31

Resolved:: 15/Sep/15 07:52