Currently the maintenance manager prioritizes flushes over compactions if the flush operations are retaining WAL segments. The goal here is to prevent the amount of in-memory data from getting so large that restarts would be incredibly slow. However, it has a somewhat unintuitive negative effect on performance:
- with the default of retaining just two segments, flushes become highly prioritized when the MRS only has ~128MB of data, regardless of the "flush_threshold_mb" configuration
- this creates lots of overlapping rowsets in the case of random-write applications
- because flushes are prioritized over compactions, compactions rarely run
- the frequent flushes, combined with low priority of compactions, means that after a few days of constant inserts, we often end up with average "bloom lookups per op" metrics of 50-100, which is quite slow even if the blooms fit in cache.
- relates to
KUDU-38 bootstrap should not replay logs that are known to be fully flushed