Lucene - Core
  1. Lucene - Core
  2. LUCENE-845

If you "flush by RAM usage" then IndexWriter may over-merge

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1
    • Fix Version/s: 2.3
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I think a good way to maximize performance of Lucene's indexing for a
      given amount of RAM is to flush (writer.flush()) the added documents
      whenever the RAM usage (writer.ramSizeInBytes()) has crossed the max
      RAM you can afford.

      But, this can confuse the merge policy and cause over-merging, unless
      you set maxBufferedDocs properly.

      This is because the merge policy looks at the current maxBufferedDocs
      to figure out which segments are level 0 (first flushed) or level 1
      (merged from <mergeFactor> level 0 segments).

      I'm not sure how to fix this. Maybe we can look at net size (bytes)
      of a segment and "infer" level from this? Still we would have to be
      resilient to the application suddenly increasing the RAM allowed.

      The good news is to workaround this bug I think you just need to
      ensure that your maxBufferedDocs is less than mergeFactor *
      typical-number-of-docs-flushed.

      1. LUCENE-845.patch
        13 kB
        Michael McCandless

        Issue Links

          Activity

          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12564572 ] jira [ 12584977 ]
          Mark Thomas made changes -
          Workflow jira [ 12400235 ] Default workflow, editable Closed status [ 12564572 ]
          Michael Busch made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Michael McCandless made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Michael McCandless made changes -
          Fix Version/s 2.3 [ 12312531 ]
          Michael McCandless made changes -
          Attachment LUCENE-845.patch [ 12363895 ]
          Michael McCandless made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Michael McCandless made changes -
          Link This issue is blocked by LUCENE-847 [ LUCENE-847 ]
          Michael McCandless made changes -
          Field Original Value New Value
          Link This issue blocks LUCENE-843 [ LUCENE-843 ]
          Michael McCandless created issue -

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development