[CASSANDRA-8641] Repair causes a large number of tiny SSTables - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 2.1.3
Component/s: None
Labels:
None
Environment:

Ubuntu 14.04

Severity:
Normal

Description

I have a 3 nodes cluster with RF = 3, quad core and 32 GB or RAM. I am running 2.1.2 with all the default settings. I'm seeing some strange behaviors during incremental repair (under write load).

Taking the example of one particular column family, before running an incremental repair, I have about 13 SSTables. After finishing the incremental repair, I have over 114000 SSTables.

Table: customers
SSTable count: 114688
Space used (live): 97203707290
Space used (total): 99175455072
Space used by snapshots (total): 0
SSTable Compression Ratio: 0.28281112416526505
Memtable cell count: 0
Memtable data size: 0
Memtable switch count: 1069
Local read count: 0
Local read latency: NaN ms
Local write count: 11548705
Local write latency: 0.030 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 144145152
Compacted partition minimum bytes: 311
Compacted partition maximum bytes: 1996099046
Compacted partition mean bytes: 3419
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

Looking at the logs during the repair, it seems Cassandra is struggling to compact minuscule memtables (often just a few kilobytes):

INFO  [CompactionExecutor:337] 2015-01-17 01:44:27,011 CompactionTask.java:251 - Compacted 32 sstables to [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-228341,].  8,332 bytes to 6,547 (~78% of original) in 80,476ms = 0.000078MB/s.  32 total partitions merged to 32.  Partition merge counts were {1:32, }
INFO  [CompactionExecutor:337] 2015-01-17 01:45:35,519 CompactionTask.java:251 - Compacted 32 sstables to [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-229348,].  8,384 bytes to 6,563 (~78% of original) in 6,880ms = 0.000910MB/s.  32 total partitions merged to 32.  Partition merge counts were {1:32, }
INFO  [CompactionExecutor:339] 2015-01-17 01:47:46,475 CompactionTask.java:251 - Compacted 32 sstables to [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-229351,].  8,423 bytes to 6,401 (~75% of original) in 10,416ms = 0.000586MB/s.  32 total partitions merged to 32.  Partition merge counts were {1:32, }

Here is an excerpt of the system logs showing the abnormal flushing:

INFO  [AntiEntropyStage:1] 2015-01-17 15:28:43,807 ColumnFamilyStore.java:840 - Enqueuing flush of customers: 634484 (0%) on-heap, 2599489 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:06,823 ColumnFamilyStore.java:840 - Enqueuing flush of levels: 129504 (0%) on-heap, 222168 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:07,940 ColumnFamilyStore.java:840 - Enqueuing flush of chain: 4508 (0%) on-heap, 6880 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:08,124 ColumnFamilyStore.java:840 - Enqueuing flush of invoices: 1469772 (0%) on-heap, 2542675 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:09,471 ColumnFamilyStore.java:840 - Enqueuing flush of customers: 809844 (0%) on-heap, 3364728 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:24,368 ColumnFamilyStore.java:840 - Enqueuing flush of levels: 28212 (0%) on-heap, 44220 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:24,822 ColumnFamilyStore.java:840 - Enqueuing flush of chain: 860 (0%) on-heap, 1130 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:24,985 ColumnFamilyStore.java:840 - Enqueuing flush of invoices: 334480 (0%) on-heap, 568959 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:27,375 ColumnFamilyStore.java:840 - Enqueuing flush of customers: 221568 (0%) on-heap, 929962 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:35,755 ColumnFamilyStore.java:840 - Enqueuing flush of invoices: 7916 (0%) on-heap, 11080 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:36,239 ColumnFamilyStore.java:840 - Enqueuing flush of customers: 9968 (0%) on-heap, 33041 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:37,935 ColumnFamilyStore.java:840 - Enqueuing flush of invoices: 42108 (0%) on-heap, 69494 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:41,182 ColumnFamilyStore.java:840 - Enqueuing flush of customers: 40936 (0%) on-heap, 159099 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:49,573 ColumnFamilyStore.java:840 - Enqueuing flush of levels: 17236 (0%) on-heap, 27048 (0%) off-heap
INFO  [AntiEntropyStage:1] 2015-01-17 15:29:50,440 ColumnFamilyStore.java:840 - Enqueuing flush of chain: 548 (0%) on-heap, 630 (0%) off-heap

At the end of the repair, the cluster has become unusable.

Attachments

Issue Links

duplicates

CASSANDRA-8267 Only stream from unrepaired sstables during incremental repair

Resolved

CASSANDRA-5220 Repair improvements when using vnodes

Resolved

Repair causes a large number of tiny SSTables

Details

Description

Attachments

Issue Links

Activity

People

Dates