[LUCENE-854] Create merge policy that doesn't periodically inadvertently optimize - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2
Fix Version/s: 3.2, 4.0-ALPHA
Component/s: core/index
Labels:
None

Lucene Fields:

New

Description

The current merge policy, at every maxBufferedDocs *
power-of-mergeFactor docs added, will do a fully cascaded merge, which
is the same as an optimize.

I think this is not good because at that "optimization poin", the
particular addDocument call is [surprisingly] very expensive. While,
amortized over all addDocument calls, the cost is low, the cost is
paid "up front" and in a very "bunched up" manner.

I think of this as "pay it forward": you are paying the full cost of
an optimize right now on the expectation / hope that you will be
adding a great many more docs. But, if you don't add that many more
docs, then, the amortized cost for your index is in fact far higher
than it should have been. Better to "pay as you go" instead.

So we could make a small change to the policy by only merging the
first mergeFactor segments once we hit 2X the merge factor. With
mergeFactor=10, when we have created the 20th level 0 (just flushed)
segment, we merge the first 10 into a level 1 segment. Then on
creating another 10 level 0 segments, we merge the second set of 10
level 0 segments into a level 1 segment, etc.

With this new merge policy, an index that's a bit bigger than a
current "optimization point" would then have a lower amortized cost
per document. Plus the merge cost is less "bunched up" and less "pay
it forward": instead you pay for what you are actually using.

We can start by creating this merge policy (probably, combined with
with the "by size not by doc count" segment level computation from
~~LUCENE-845~~) and then later decide whether we should make it the
default merge policy.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-854.patch
11/Feb/11 15:35
110 kB
Michael McCandless

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 31/Mar/07 12:14

Updated:: 28/Aug/22 11:36

Resolved:: 05/May/11 23:30