Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13422

CompactionStrategyManager should take write not read lock when handling remove notifications



    • Severity:


      getNextBackgroundTask in various compaction strategies (definitely LCS) rely on checking the result of DataTracker.getCompacting() to avoid accessing data and metadata related to tables that have already head their resources released.

      There is a race where this check is unreliable and will claim a table that has its resources already released is not compacting resulting in use after free.

      LeveledCompactionStrategy.findDroppableSSTable for instance has this three part logical && condition where the first check is against the compacting set before calling worthDroppingTombstones which fails if the table has been released.

      The order of events is basically that CompactionStrategyManager acquires the read lock in getNextBackgroundTask(), then proceeds eventually to findDroppableSSTable and acquires a set of SSTables from the manifest. While the manifest is thread safe it's not accessed atomically WRT to other operations. Once it has acquired the set of tables it acquires (not atomically) the set of compacting SSTables and iterates checking the former against the latter.

      Meanwhile other compaction threads are marking tables obsolete or compacted and releasing their references. Doing this removes them from DataTracker and publishes a notification to the strategies, but this notification only requires the read lock. After the compaction thread has published the notifications it eventually marks the table as not compacting in DataTracker or removes it entirely.

      The race is then that the compaction thread generating a new background task acquires the sstables from the manifest on the stack. Any table in that set that was compacting at that time must remain compacting so that it can be skipped. Another compaction thread finishes a compaction and is able to remove the table from the manifest and then remove it from the compacting set. The thread generating the background task then acquires the list of compacting tables which doesn't include the table it is supposed to skip.

      The simple fix appears to be to require threads to acquire the write lock in order to publish notifications of tables being removed from compaction strategies. While holding the write lock it won't be possible for someone to see a view of tables in the manifest where tables that are compacting aren't compacting in the view.




            • Assignee:
              aweisberg Ariel Weisberg
              aweisberg Ariel Weisberg
              Ariel Weisberg
              Marcus Eriksson
            • Votes:
              0 Vote for this issue
              3 Start watching this issue


              • Created: