Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8359

Make DTCS consider removing SSTables much more frequently

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Fixed
    • 2.0.15, 2.1.5, 2.2.0 beta 1
    • None

    Description

      When I run DTCS on a table where every value has a TTL (always the same TTL), SSTables are completely expired, but still stay on disk for much longer than they need to. I've applied CASSANDRA-8243, but it doesn't make an apparent difference (probably because the subject SSTables are purged via compaction anyway, if not by directly dropping them).

      Disk size graphs show clearly that tombstones are only removed when the oldest SSTable participates in compaction. In the long run, size on disk continually grows bigger. This should not have to happen. It should easily be able to stay constant, thanks to DTCS separating the expired data from the rest.

      I think checks for whether SSTables can be dropped should happen much more frequently. This is something that probably only needs to be tweaked for DTCS, but perhaps there's a more general place to put this. Anyway, my thinking is that DTCS should, on every call to getNextBackgroundTask, check which SSTables can be dropped. It would be something like a call to CompactionController.getFullyExpiredSSTables with all non-compactingSSTables sent in as "compacting" and all other SSTables sent in as "overlapping". The returned SSTables, if any, are then added to whichever set of SSTables that DTCS decides to compact. Then before the compaction happens, Cassandra is going to make another call to CompactionController.getFullyExpiredSSTables, where it will see that it can just drop them.

      This approach has a bit of redundancy in that it needs to call CompactionController.getFullyExpiredSSTables twice. To avoid that, the code path for deciding SSTables to drop would have to be changed.

      (Side tracking a little here: I'm also thinking that tombstone compactions could be considered more often in DTCS. Maybe even some kind of multi-SSTable tombstone compaction involving the oldest couple of SSTables...)

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Bj0rn Björn Hegerfors Assign to me
            Bj0rn Björn Hegerfors
            Björn Hegerfors
            Marcus Eriksson
            Shawn Kumar Shawn Kumar
            Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment