@Lars/Stack: note that the number of StoreFiles necessary to store N amount of data is order O(log N) with the existing compaction algorithm. This means that setting the compaction min size to a low value will not result in significantly more files. Furthermore, what's hurting performance is not the amount of files but the size of each file. The extra files will be very small and take up only a minority of the space in the LRU cache. Every time you unnecessarily compact files, you have to repopulate that StoreFile in the LRU cache and get a lot of disk reads in addition to the obvious write increase. This is all to say that I would recommend defaulting it to that low because the downsides are very minimal and the benefit can be substantial IO gains.
At the same time, I'd think this issue still worth some time; if lots of cfs and only one is filling, its silly to flush the others as we do now because one is over the threshold.
Why is this silly? With cache-on-write, the data is still cached in memory. It's just migrated from the MemCache to the BlockCache, which has comparable performance. Furthermore, BlockCache data is compressed, so it then takes up less space. Flushing also minimizes the amount of HLogs and decreases recovery time. Flushing would be bad if it meant we weren't optimally using the global MemStore size, but we currently are.
This surely seems a specific setting for this use-case, and there are others that need a slightly different setting. If you mix those two on the same cluster, then having only one global setting to adjust this seems restrictive? Should this be a setting per table, like the flush size?
I think this is a better default, not that it's a one-size setting. I agree that this should toggleable on a per-CF basis, hence