Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-18507

Partial compaction can resurrect deleted data

    XMLWordPrintableJSON

Details

    Description

      If there isn't enough disk space available to compact all existing sstables, Cassandra will attempt to perform a partial compaction by removing sstables from the set of candidate sstables to be compacted, starting with the largest one. It is possible that the sstable removed from the set of sstables to compact contains data for which there are tombstones in another (more recent) sstable. Since the overlaps between sstables is computed when the CompactionController is created, and the CompactionController is created before the removal of any sstables from the set of sstables to be compacted this computed overlap will be outdated when checking which sstables are covered by certain tombstones. This leads to the faulty conclusion that the tombstones can be pruned during the compaction, causing the data to be resurrected.

      The issue is present in Cassandra 4.0 and 4.1. Cassandra 3.11 creates the CompactionController after the set of sstables to compact has been reduced, and is thus not affected. trunk does not appear to support partial compactions at all, but instead refuses to compact when the disk is full.

      This regression appears to have been introduced by CASSANDRA-13068.

      Attachments

        Activity

          People

            toblin Tobias Lindaaker
            toblin Tobias Lindaaker
            Tobias Lindaaker
            David Capwell, Marcus Eriksson
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: