Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Duplicate
-
None
-
Normal
Description
We deployed a 3-nodes cluster (with 2.1.5) which worked under stable write requests ( about 2k wps) to a CF with DTCS, a default TTL as 43200s and gc_grace as 21600s. The CF contained inserted only, complete time series data. We found cassandra will occasionally keep writing logs like this:
INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,195 CompactionTask.java:270 - Compacted 1 sstables to [/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270,]. 449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s. 4 total partitions merged to 4. Partition merge counts were
{1:4, }INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,241 CompactionTask.java:140 - Compacting [SSTableReader(path='/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270-Data.db')]
INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,253 CompactionTask.java:270 - Compacted 1 sstables to [/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516271,]. 449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s. 4 total partitions merged to 4. Partition merge counts were
It seems that cassandra kept doing compacting to a single SStable, serveral times per second, and lasted for many hours. Tons of logs were thrown and one CPU core exhausted during this time. The endless compacting finally end when another compaction started with a group of SStables (including previous one). All of our 3 nodes have been hit by this problem, but occurred in different time.
We could not figure out how the problematic SStable come up because the log has wrapped around.
We have dumped the records in the SStable and found it has the oldest data in our CF (again, our data was time series), and all of the record in this SStable have bben expired for more than 18 hours (12 hrs TTL + 6 hrs gc) so they should be dropped. However, c* do nothing to this SStable but compacting it again and again, until more SStable were out-dated enough to be considered for compacting together with this one by DTCS.