Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-845

If you "flush by RAM usage" then IndexWriter may over-merge

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1
    • Fix Version/s: 2.3
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I think a good way to maximize performance of Lucene's indexing for a
      given amount of RAM is to flush (writer.flush()) the added documents
      whenever the RAM usage (writer.ramSizeInBytes()) has crossed the max
      RAM you can afford.

      But, this can confuse the merge policy and cause over-merging, unless
      you set maxBufferedDocs properly.

      This is because the merge policy looks at the current maxBufferedDocs
      to figure out which segments are level 0 (first flushed) or level 1
      (merged from <mergeFactor> level 0 segments).

      I'm not sure how to fix this. Maybe we can look at net size (bytes)
      of a segment and "infer" level from this? Still we would have to be
      resilient to the application suddenly increasing the RAM allowed.

      The good news is to workaround this bug I think you just need to
      ensure that your maxBufferedDocs is less than mergeFactor *
      typical-number-of-docs-flushed.

        Attachments

        1. LUCENE-845.patch
          13 kB
          Michael McCandless

          Issue Links

            Activity

              People

              • Assignee:
                mikemccand Michael McCandless
                Reporter:
                mikemccand Michael McCandless
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: