What is the relationship between BASE_SAMPLING_LEVEL and MIN_SAMPLING_LEVEL with indexInterval?
BASE/MIN_SAMPLING_LEVEL are orthogonal to indexInterval. BASE_SAMPLING_LEVEL essentially sets the granularity at which you can down/upsample. MIN_SAMPLING_LEVEL sets a limit on how low you can downsample. (I'll note that we could potentially raise indexInterval alongside these changes in order to have more summary entries for hot sstables.)
How many rows do we get for 5% of a 8GB heap?
That gives us ~410 MiB to work with. If we assume the average key length is 8 bytes, each summary entry uses 20 bytes of space, giving us ~21 million summary entries.
At full sampling, that's 21MM * 128 = 2.7 billion rows, assuming no overlap across sstables. At minimum sampling, that's ~11 billion rows.
If the avg key size is 16 bytes, that drops to ~2 and ~8 billion rows.
Isn't it a minor bug to just ignore compacting sstables? Suggest reducing memory pool to allocate to the uncompacting ones, by the amount allocated to the compacting ones.
Good point, I agree.
Could we just resample at compaction time instead of dealing with refcounting or locking? That probably gives up too much of the potential benefits.
Yeah, that would probably be okay for small sstables that are compacted frequently, but the large sstables would be tuned poorly, and those make up the majority of the memory use.
I think we could make it almost as elegant by using the datatracker replace mechanism originally for compaction, to build a new SSTR and swap it in w/o extra concurrency controls.
That's a good idea; I think it would be fairly clean. I'll give that a shot.
Is the idea behind touching it in DD to force the mbean to be loaded, or is there a circular dependency that breaks w/o that?
Neither the IndexSummaryManager singleton nor the mbean are loaded without that. No other classes use the IndexSummaryManager,
so the static fields are never initialized. (Just importing the classes doesn't seem to trigger the class loader.)