Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-843

improve how IndexWriter uses RAM to buffer added documents

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.3
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I'm working on a new class (MultiDocumentWriter) that writes more than
      one document directly into a single Lucene segment, more efficiently
      than the current approach.

      This only affects the creation of an initial segment from added
      documents. I haven't changed anything after that, eg how segments are
      merged.

      The basic ideas are:

      • Write stored fields and term vectors directly to disk (don't
        use up RAM for these).
      • Gather posting lists & term infos in RAM, but periodically do
        in-RAM merges. Once RAM is full, flush buffers to disk (and
        merge them later when it's time to make a real segment).
      • Recycle objects/buffers to reduce time/stress in GC.
      • Other various optimizations.

      Some of these changes are similar to how KinoSearch builds a segment.
      But, I haven't made any changes to Lucene's file format nor added
      requirements for a global fields schema.

      So far the only externally visible change is a new method
      "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
      deprecated) so that it flushes according to RAM usage and not a fixed
      number documents added.

        Attachments

        1. LUCENE-843.patch
          141 kB
          Michael McCandless
        2. LUCENE-843.take2.patch
          148 kB
          Michael McCandless
        3. LUCENE-843.take3.patch
          156 kB
          Michael McCandless
        4. LUCENE-843.take4.patch
          188 kB
          Michael McCandless
        5. LUCENE-843.take5.patch
          239 kB
          Michael McCandless
        6. LUCENE-843.take6.patch
          210 kB
          Michael McCandless
        7. LUCENE-843.take7.patch
          189 kB
          Michael McCandless
        8. LUCENE-843.take8.patch
          203 kB
          Michael McCandless
        9. LUCENE-843.take9.patch
          204 kB
          Michael McCandless
        10. index.presharedstores.cfs.zip
          2 kB
          Michael McCandless
        11. index.presharedstores.nocfs.zip
          5 kB
          Michael McCandless

          Issue Links

            Activity

              People

              • Assignee:
                mikemccand Michael McCandless
                Reporter:
                mikemccand Michael McCandless
              • Votes:
                5 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: