Below I will present some live cluster analysis about 10 days after deploying the original patch that entirely removes the check for range overlap in worthDroppingTombstones().
In our dataset we use both LCS and STCS, but most of the CFs are STCS. A significant portion of our dataset is comprised of append-only TTL-ed data, so a good match for tombstone compaction. Most of our large CFs with high droppable tombstone ratio use STCS, but there are a few that use LCS that also benefited from the patch.
I deployed the patch in 2 different ranges with similar results. The metrics were collected between 1st of May and 16 of May, the nodes were patched on the 7th of May. Used cassandra version was 1.2.16.
In the analysis I compare the total space used (Cassandra Load), tombstone Ratio, disk utilization (system disk xvbd util), total bytes compacted and system load (linux cpu). For the last three metrics I also calculate the integral of the metric to make it easier to compare the total amount during the period.
Each graph compares the metrics of the patched node with it's previous neighbor and next neighbor, no VNODES is used. So, the first row in the figure is node N-1, the second row is node N (the patched node, marked with asterisk), and the third row is node N+1.
- Cassandra load: In the patched node, it's possible to see a sudden decrease of 7% of disk space when the patch was applied, due to the execution of single SSTable compactions. The growth rate of disk usage is also decreased after the patch, since tombstone are cleared more often. In the whole period, there was a 1.2% disk space increase in the patched node, against about 10% growth on the unpatched nodes.
- Tombstone ratio: After the patch is applied, it's possible to see a decrease in the droppable tombstone ratio, that revolves around the default level of 20% after that. The droppable tombstone ratio of unpatched nodes remains high for most CFs, what indicates that tombstone compactions are not being triggered at all.
- Disk utilization: it's not possible to detect any change in the disk utilization pattern after the patch is applied, what might indicate the I/O is not affected by the patch, at least for our mixed dataset. I double checked the IOPS graph for the period and there was not even a slight sign of change in the I/O pattern after the patch was applied. (https://issues.apache.org/jira/secure/attachment/12645312/patch-v1-iostat.png)
- Total Bytes compacted: The number of compacted bytes in the patched node was about 17% higher in the period. About 7% due to the initial tombstones that were cleared and more 7% due to cleared tombstones after the patch was applied (the difference between the 2 nodes sizes). The remaining 3% can be attributed to unnecessary compactions + normal variations because of different node ranges.
- System CPU Load: Was not affected by the patch.
I implemented another version of the patch (v2) as suggested by Marcus Eriksson, that instead of dropping the overlap check entirely, it only performs the check for SSTables containing rows with smaller timestamp than the candidate SSTable (https://issues.apache.org/jira/secure/attachment/12645316/1.2.16-CASSANDRA-6563-v2.txt).
One week ago I deployed this alternative patch on 2 of our production nodes, and unfortunately loosing the checks did not achieve significant results. I added some debugging log to the code and what I verified is that despite reducing the number of sstables to compare with, even if only one SSTable has a column with an equal or lower timestamp to the candidate SSTable, the token ranges of these sstables always overlap because of the Random Partitioner. So, this supports the claim that even with loosen checks, the single-sstable tombstone compaction is almost never being triggered. At least on the use cases that could benefit from it.
The graphs for the alternative patch analysis can be found here: https://issues.apache.org/jira/secure/attachment/12645240/patch-v2-range3.png