2. Added and used FileUtils.close(Collection<Closeable>)
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice (imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled... for someone who really wants to disable it, setting it to a high enough value that it never kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update bytesRead would probably be better to do in another ticket?
Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains MIN_THRESHOLD times more data than the previous bucket, and needs to be compacted 1 / MIN_THRESHOLD times as often (see the math in the description). This means that the number of buckets influences how fast you need to compact, and that each additional bucket adds a linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket levels, and you are flushing 1 MB/s, you need to compact at 1 MB/s * 15.
As an example: with MIN_THRESHOLD=2, each bucket is twice is large as the previous. Say that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 more than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8 - 1). So, ideally we would be able to compact those 8 units in exactly the time it takes for 1 more unit to be flushed, and for the compactions of the other buckets to trickle up and refill the largest bucket. Pheew?
CASSANDRA-2171 will allow us to calculate the flush rate, which we can then multiply by the count of buckets (note... one tiny missing piece is determining how many buckets are "empty": an empty bucket is not created in the current approach).
> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you will fall behind on compaction, because every new bucket is a chance to compact. It's a fundamental question, but I haven't thought about it... sorry.