Description
Compaction throttling needs to know how many active compactions are running (to divide bandwith for each active compaction).
The way active compaction is counted can be broken because it counts the number of active threads in the executor BUT the thread starts by acquiring a lock.
If the lock can't be acquired immediately : the thread is seen as "active" but does not participate in IO operations.
The case can happen when major compaction are triggered (major compaction acquire a write lock, while minor compactions acquire a read lock).
Having compaction througput to 16Mb/s, we observed is the following (two times) :
- only 1 active compaction (a long one for a few hours) starting at 16Mb/s, then after some time running at 2Mb/s, thus taking a very long time to complete
- many pending compactions
Using JMX and monitoring the stack trace of the compaction threads showed that :
- 1 thread was effectively compacting
- 1 thread was waiting to acquire the write lock (due to a major compaction)
- 6 threads were waiting to acquire the read lock (probably due to the thread above trying to acquire the write lock)
Attached is a proposed patch (very simple, not yet tested) which counts only active compactions.