Lucene - Core
  1. Lucene - Core
  2. LUCENE-843

improve how IndexWriter uses RAM to buffer added documents

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.3
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I'm working on a new class (MultiDocumentWriter) that writes more than
      one document directly into a single Lucene segment, more efficiently
      than the current approach.

      This only affects the creation of an initial segment from added
      documents. I haven't changed anything after that, eg how segments are
      merged.

      The basic ideas are:

      • Write stored fields and term vectors directly to disk (don't
        use up RAM for these).
      • Gather posting lists & term infos in RAM, but periodically do
        in-RAM merges. Once RAM is full, flush buffers to disk (and
        merge them later when it's time to make a real segment).
      • Recycle objects/buffers to reduce time/stress in GC.
      • Other various optimizations.

      Some of these changes are similar to how KinoSearch builds a segment.
      But, I haven't made any changes to Lucene's file format nor added
      requirements for a global fields schema.

      So far the only externally visible change is a new method
      "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
      deprecated) so that it flushes according to RAM usage and not a fixed
      number documents added.

      1. index.presharedstores.cfs.zip
        2 kB
        Michael McCandless
      2. index.presharedstores.nocfs.zip
        5 kB
        Michael McCandless
      3. LUCENE-843.patch
        141 kB
        Michael McCandless
      4. LUCENE-843.take2.patch
        148 kB
        Michael McCandless
      5. LUCENE-843.take3.patch
        156 kB
        Michael McCandless
      6. LUCENE-843.take4.patch
        188 kB
        Michael McCandless
      7. LUCENE-843.take5.patch
        239 kB
        Michael McCandless
      8. LUCENE-843.take6.patch
        210 kB
        Michael McCandless
      9. LUCENE-843.take7.patch
        189 kB
        Michael McCandless
      10. LUCENE-843.take8.patch
        203 kB
        Michael McCandless
      11. LUCENE-843.take9.patch
        204 kB
        Michael McCandless

        Issue Links

          Activity

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              5 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development