[CASSANDRA-18443] Deadlock updating sstable metadata if disk boundaries need reloading - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 4.0.10, 4.1.2, 5.0-alpha1, 5.0
Component/s: Local/Compaction, Local/Memtable, Local/SSTable
Labels:
None

Change Category:
Operability
Complexity:
Normal
Platform:

All
Impacts:

None
Source Control Link:

https://github.com/apache/cassandra/commit/cd9bed0aeadd94136a8a6c6ed284cc4684b0666c
Test and Documentation Plan:

Hide

Run through CI

Show
Run through CI

Description

CompactionStrategyManager.handleNotification holds the read lock while processing notifications. When handling metadata changed notifications, an extra call is made to maybeReloadDiskBoundaries which tries to grab the write lock and deadlocks the thread.

Partial stacktrace

        at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method)
        - parking to wait for  <0x00000005cc000078> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
        at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock
        at org.apache.cassandra.db.compaction.CompactionStrategyManager.maybeReloadDiskBoundaries(CompactionStrategyManager.java:495)
        at org.apache.cassandra.db.compaction.CompactionStrategyManager.getCompactionStrategyFor(CompactionStrategyManager.java:343)
        at org.apache.cassandra.db.compaction.CompactionStrategyManager.handleMetadataChangedNotification(CompactionStrategyManager.java:796)
        at org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:838)
        at org.apache.cassandra.db.lifecycle.Tracker.notifySSTableMetadataChanged(Tracker.java:482)
        at org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:838)

Deadlocking with the read lock held blocks the SlabpoolCleaner while notifying ColumnFamilyStore so memtables are prevented from being flushed and recycled, causing any thread applying a mutation to the database (at least GossipStage and MutationStage) to be considered down by peers and/or back up with pending requests.

All the cases investigated were during single sstable upleveling by org.apache.cassandra.db.compaction.SingleSSTableLCSTask added in ~~CASSANDRA-12526~~.

Other less critical work was also affected, JMX calls to get estimated remaining compaction tasks, the index summary manager redistributing summaries, the StatusLogger trying to log dropped messages, and the ValidationManager.

Workaround is to reboot the affected host.

The fix is to just remove the redundant disk boundary reload check on that path.

Attachments

Activity

People

Assignee:: Jon Meredith

Reporter:: Jon Meredith

Authors:: Jon Meredith

Reviewers:: Marcus Eriksson

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Apr/23 17:54

Updated:: 12/Sep/23 13:03

Resolved:: 21/Apr/23 17:58