Tested on version 2.2.11 (but seems like trunck 3.x is still the same for the related code path), using nodetool refresh for restoring a snapshot
Calling StorageService.loadNewSSTables function results in deadlock with compaction background task.
From StorageService class , function public void loadNewSSTables(String ksName, String cfName) a call is made to ColumnFamilyStore class , function public static synchronized void loadNewSSTables(String ksName, String cfName) and then a call to Keyspace class, function public static Keyspace open(String keyspaceName)
getting to the function private static Keyspace open(String keyspaceName, Schema schema, boolean loadSSTables)
finally trying to get a lock by synchronized (Keyspace.class)
So inside the ColumnFamilyStore class lock, there is an attempt to get the lock on the Keyspace.class
Now at the same time I have the thread OptionalTasks executing the ColumnFamilyStore.getBackgroundCompactionTaskSubmitter() task.
The thread task is also calling Keyspace.open function, already progressed as far as getting the lock on Keyspace class.
But then the call also initializes the column families and thus is calling on class ColumnFamilyStore the public static synchronized ColumnFamilyStore createColumnFamilyStore ...
Result : the external call on loadNewSSTables blocks the internal compaction background task.
So function 1 locks A and then B
And function 2 locks B and then A
leading to deadlock (due to incorrect order of locking objects)