Patch that optionally creates a global heap usage threshold and tries to keep total memtable size under that.
The two main points of interest are Memtable.updateLiveRatio and MeteredFlusher.
MeteredFlusher is what checks memory usage (once per second) and kicks of the flushes. Note that naively flushing when we hit the threshold is wrong, since you can have multiple memtables in-flight during the flush process. To address this, we track inactive but unflushed memtables and include those in our total. We also aggressively flush any memtable that reaches the level of "if my entire flush pipeline were full of memtables of this size, how big could I allow them to be."
Since counting each object's size is far too slow to be useful directly, we compute the ratio of serialized size to memory size in the background, and update that periodically; That is what updateLiveRatio does. MeteredFlusher then bases its work on actual serialized size, multiplied by this ratio.
One last note: the config code is a little messy because we want to leave behavior unchanged (i.e.: only use old per-CF thresholds) if the setting is absent as it would be for an upgrader. But, we want a setting to allow "pick a reasonable default based on heap usage;" hence the distinction b/t null and -1 (autocompute).
I tested by creating the stress schema, then modifying the per-CF settings to be multiple TB, so only the new global flusher affects things. Then I created half a GB of commitlog files to reply – CL replay hammers it much harder than even stress.java.
It was successful in preventing OOM (or even the "emergency flushing" at 85% of heap) but heap usage as reported by CMS was consistently about 25% higher than what MeteredFlusher thought it should be. It may be that we can fudge factor this; otherwise, tuning by watching CMS vs estimated size and adjusting the setting manually to compensate, is still much easier than the status quo of per-CF tuning.
To experiment, I recommend also patching the log4j settings as follows:
--- conf/log4j-server.properties (revision 1085010)
+++ conf/log4j-server.properties (working copy)
@@ -35,7 +35,8 @@
# Application logging options