Details
-
Improvement
-
Status: Resolved
-
Low
-
Resolution: Won't Fix
-
None
-
None
Description
In my very long post on CASSANDRA-6602, I mentioned a more aggressive windowing strategy, which looks for opportunities to compact into larger SSTables sooner. The original approach was that when we have min_threshold windows of the same size and another one of the same size appears next to them, those windows (not including the newest addition) merge. This new approach doesn't wait for a (min_threshold+1)th one. As soon as min_threshold windows of one size are created, they merge at once. The only exception is the "incoming window", which stays outside of merging with other windows until it is no longer the incoming window.
This does mean that occasionally more than min_threshold SSTables, not all of similar size get compacted, intentionally. For example, let's say min_threshold is 4, then if we have 3 windows size 16, 3 windows size 4 and just get a 4th size 1 window that isn't the incoming window, we immediately merge all of those into a size 64 window. Typically we expect one SSTable to be in each window with a file size corresponding to the window size in some unit of measure. So we merge roughly 10 SSTables in that scenario.
These bigger compactions happen rarely, about as often as a similar thing happens in STCS (on occasion the number of SSTables gets very small). This tweak to DTCS is meant to mimic that behavior in STCS. It has been observed that DTCS typically has 50% to 100% more SSTables than STCS, so this is a way to counter that.