Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9661

Endless compaction to a tiny, tombstoned SStable

    XMLWordPrintableJSON

Details

    • Normal

    Description

      We deployed a 3-nodes cluster (with 2.1.5) which worked under stable write requests ( about 2k wps) to a CF with DTCS, a default TTL as 43200s and gc_grace as 21600s. The CF contained inserted only, complete time series data. We found cassandra will occasionally keep writing logs like this:

      INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,195 CompactionTask.java:270 - Compacted 1 sstables to [/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270,]. 449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s. 4 total partitions merged to 4. Partition merge counts were

      {1:4, }

      INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,241 CompactionTask.java:140 - Compacting [SSTableReader(path='/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270-Data.db')]
      INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,253 CompactionTask.java:270 - Compacted 1 sstables to [/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516271,]. 449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s. 4 total partitions merged to 4. Partition merge counts were

      {1:4, }

      It seems that cassandra kept doing compacting to a single SStable, serveral times per second, and lasted for many hours. Tons of logs were thrown and one CPU core exhausted during this time. The endless compacting finally end when another compaction started with a group of SStables (including previous one). All of our 3 nodes have been hit by this problem, but occurred in different time.

      We could not figure out how the problematic SStable come up because the log has wrapped around.

      We have dumped the records in the SStable and found it has the oldest data in our CF (again, our data was time series), and all of the record in this SStable have bben expired for more than 18 hours (12 hrs TTL + 6 hrs gc) so they should be dropped. However, c* do nothing to this SStable but compacting it again and again, until more SStable were out-dated enough to be considered for compacting together with this one by DTCS.

      Attachments

        Activity

          People

            Unassigned Unassigned
            noel2004 WeiFan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: