When under memory pressure, we'll aggressively perform the maintenance operation that frees the most memory. Right now, the only ops that register memory are MRS and DMS flushes.
In practice, this means a couple things:
- In most cases, we'll prioritize flushing MRSs way ahead of flushing DMS, since updates are spread across many DMSs and will therefore tend to be small, whereas any non-trivial insert workload will well up into a single MRS for an entire tablet
- We'll only flush a single DMS at a time to free memory. Because of this, and because we'll likely prioritize MRS flushes over DMS flushes, we may end up with a ton of tiny DMSs in a tablet that we'll never flush. This can end up bloating the WALs because each DMS may be anchoring some WAL segments.
A couple thoughts on small things we can do to improve this:
- Register the DMS size as ram anchored by a compaction. This will meant that we can schedule compactions to flush DMSs en masse. This would still mean that we could end up always prioritizing MRS flushes, depending on how quickly we're inserting.
- We currently register the amount disk space an LogGC would free up. We could do something similar, but register how many log anchors an op could release. This would be a bit trickier, since the log anchors aren't solely determined by the mem-stores (e.g. we'll anchor segments to catch up slow followers).
- Introduce a new op (or change the flush DMS op) that would flush as many DMSs as we can for a given tablet.
Between these, the first seems like it'd be an easy win.