Resolution: Won't Fix
Fix Version/s: None
(this has been discussed on the ML:s before; I am filing it now so that there is a ticket to refer to on the wiki)
CASSANDRA-1876 is open to allow parallel compaction for the purpose of throughput. However, that only addresses one aspect of why parallel compaction is useful; the other half is ensuring that compaction is proceeding in a timely fashion at each "size tier" (for lack of a better term).
CASSANDRA-1876 is about CPU concurrency while this is about functional concurrency. I propose that compaction be a process which performs some amount of compaction work per second (I'm thinking ahead to future rate limiting; that's another ticket to be filed). That work has to be spread out over multiple compaction tiers in a way that is not coupled with CPU concurrency.
Suggested solution is to have N number of concurrent compaction threads going at any given moment (
CASSANDRA-1876), but to have those compaction threads perform work for a variable number of compaction jobs. Compactions would be triggered according to similarly sized sstables as before, but each such compaction would be a compaction "job" that is independent of any actual compaction thread.
Compaction threads move between compaction jobs at a coarse granularity so that synchronization overhead is irrelevant (for example it might go and look for other work to do every memtable_throughput_in_mb megabytes). Smaller compaction jobs take priority over larger jobs. This is intended to keep sstable counts down, and always leave the larger jobs as the ones having to wait given that they are not latency sensitive anyway due to their size.
The primary downside is that disk usage spikes would much more easily reach "double cf size" levels when many compactions are running. This is probably something that can be mitigated by
CASSANDRA-1608 with its talk of limited sstable sizes.