Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
None
-
4x HP Gen9 dl 360 servers
2x8 cpu each (Intel(R) Xeon E5-2667 v3 @ 3.20GHz)
6x900GB 10kRPM disk for data
1x900GB 10kRPM disk for commitlog
64GB ram
ETH: 10Gb/s
Red Hat Enterprise Linux Server release 6.6 (Santiago) 2.6.32-504.el6.x86_64
java build 1.8.0_45-b14 (openjdk) (tested on oracle java 8 too)4x HP Gen9 dl 360 servers 2x8 cpu each (Intel(R) Xeon E5-2667 v3 @ 3.20GHz) 6x900GB 10kRPM disk for data 1x900GB 10kRPM disk for commitlog 64GB ram ETH: 10Gb/s Red Hat Enterprise Linux Server release 6.6 (Santiago) 2.6.32-504.el6.x86_64 java build 1.8.0_45-b14 (openjdk) (tested on oracle java 8 too)
-
Normal
Description
Hi,
We noticed some problem with dropped mutationstages. Usually on one random node there is a situation that:
MutationStage "active" is full, "pending" is increasing "completed" is stalled.
MemtableFlushWriter "active" 6, pending: 25 completed: stalled
MemtablePostFlush "active" is 1, pending 29 completed: stalled
after a some time (30s-10min) pending mutations are dropped and everything is working.
When it happened:
1. Cpu idle is ~95%
2. no gc long pauses or more activity.
3. memory usage 3.5GB form 8GB
4. only writes is processed by cassandra
5. when LOAD > 400GB/node problems appeared
6. cassandra 2.1.6
There is gap in logs:
INFO 08:47:01 Timed out replaying hints to /192.168.100.83; aborting (0 delivered) INFO 08:47:01 Enqueuing flush of hints: 7870567 (0%) on-heap, 0 (0%) off-heap INFO 08:47:30 Enqueuing flush of table1: 95301807 (4%) on-heap, 0 (0%) off-heap INFO 08:47:31 Enqueuing flush of table1: 60462632 (3%) on-heap, 0 (0%) off-heap INFO 08:47:31 Enqueuing flush of table2: 76973746 (4%) on-heap, 0 (0%) off-heap INFO 08:47:31 Enqueuing flush of table1: 84290135 (4%) on-heap, 0 (0%) off-heap INFO 08:47:32 Enqueuing flush of table3: 56926652 (3%) on-heap, 0 (0%) off-heap INFO 08:47:32 Enqueuing flush of table1: 85124218 (4%) on-heap, 0 (0%) off-heap INFO 08:47:33 Enqueuing flush of table2: 95663415 (4%) on-heap, 0 (0%) off-heap INFO 08:47:58 CompactionManager 2 39 INFO 08:47:58 Writing Memtable-table2@1767938721(13843064 serialized bytes, 162359 ops, 4%/0% of on/off-heap l imit) INFO 08:47:58 Writing Memtable-hints@1433125911(478703 serialized bytes, 424 ops, 0%/0% of on/off-heap limit) INFO 08:47:58 Writing Memtable-table2@1318583275(11783615 serialized bytes, 137378 ops, 4%/0% of on/off-heap l imit) INFO 08:47:58 Enqueuing flush of compactions_in_progress: 969 (0%) on-heap, 0 (0%) off-heap INFO 08:47:58 Writing Memtable-table1@541175113(17221327 serialized bytes, 180792 ops, 4%/0% of on/off-heap limit) INFO 08:47:58 Writing Memtable-table1@1361154669(27138519 serialized bytes, 273472 ops, 6%/0% of on/off-hea p limit) INFO 08:48:03 2176 MUTATION messages dropped in last 5000ms
use case:
100% write - 100Mb/s, couples of CF ~10column each. max cell size 100B
CMS and G1GC tested - no difference