I created a new merge policy, to take advantage of non-contiguous merging (
LUCENE-1076) and fix certain limitations of LogMergePolicy.
The new policy does not support contiguous merging, and always merges according to byte size, always pro rated by pct deletes.
The policy's core logic is similar to LogMP, in that it tries to merge roughly equal sized segments at once, maxMergeAtOnce (renamed from mergeFactor) at a time, resulting in the usual exponential staircase pattern when you feed it roughly equal sized segments.
You configure the approx max merged segment size (unlike LogMP where you configure the max to-be-merged size, which was always a source of confusion!). Unlike LogMP, when segments are getting close to being too large, the new policy will merge fewer segs, eg down to merging pairwise, to reach approx the max allowed size. This is important since it makes that setting more "accurate"; I now default it to 5 GB (vs LogMP's 2 GB).
There is a separate maxMergeAtOnceExplicit that controls "explicit" merging (ie, app calls optimize or expungeDeletes, and maybe in the future also addIndexes); I defaulted it to 30. There is no max segment size for optimize.
The big difference vs LogMP is that the new policy does not "over-merge", meaning it does not "pay it forward"/forcefully cascade the way LogMP does today. This fixes the "inadvertent optimize" that LogMP does.
For any given sized index, the new policy computes a budget of how many segments that index is allowed to have (ie, it enumerates the steps in the stair case, based on mergeAtOnce, [floored] min segment size, and total bytes in the index); then, if the index is over-budget it picks the least-cost merge. This results in a smoother progression over time of number of segments.
There is a new configuration, segmentsPerTier, that lets you control how many segments per level you can "tolerate". This is a nice knob to turn to tradeoff merge cost vs search cost. It defaults to 10, which means it matches the staircase pattern that LogMP produces, but you can now separately control the "width" of the stairs in the staircase, from how many segments are merged at once for non-explicit merges.
It has useCompoundFile and noCFSRatio just like LogMP.
It has a new setting "expungeDeletesPctAllowed", default 10%, which allows expungeDeletes to skip merging a segment if it has < 10% deletions.
I think we should keep LogMergePolicy available for apps that want contiguous merging, merge by doc count, to not pro-rate by deletions, or to enforce a max segment size during optimize. But, with this, I'd remove the non-contiguous support for LogMergePolicy that was added under
LUCENE-1076, and make this new MP the default one.