[LUCENE-845] If you "flush by RAM usage" then IndexWriter may over-merge - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.1
Fix Version/s: 2.3
Component/s: core/index
Labels:
None

Lucene Fields:

New

Description

I think a good way to maximize performance of Lucene's indexing for a
given amount of RAM is to flush (writer.flush()) the added documents
whenever the RAM usage (writer.ramSizeInBytes()) has crossed the max
RAM you can afford.

But, this can confuse the merge policy and cause over-merging, unless
you set maxBufferedDocs properly.

This is because the merge policy looks at the current maxBufferedDocs
to figure out which segments are level 0 (first flushed) or level 1
(merged from <mergeFactor> level 0 segments).

I'm not sure how to fix this. Maybe we can look at net size (bytes)
of a segment and "infer" level from this? Still we would have to be
resilient to the application suddenly increasing the RAM allowed.

The good news is to workaround this bug I think you just need to
ensure that your maxBufferedDocs is less than mergeFactor *
typical-number-of-docs-flushed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-845.patch
15/Aug/07 23:57
13 kB
Michael McCandless

Issue Links

blocks

LUCENE-843 improve how IndexWriter uses RAM to buffer added documents

Closed

is blocked by

LUCENE-847 Factor merge policy out of IndexWriter

Closed

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Mar/07 20:15

Updated:: 28/Aug/22 11:36

Resolved:: 18/Sep/07 09:40