Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-843

improve how IndexWriter uses RAM to buffer added documents

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.2
    • 2.3
    • core/index
    • None
    • New, Patch Available

    Description

      I'm working on a new class (MultiDocumentWriter) that writes more than
      one document directly into a single Lucene segment, more efficiently
      than the current approach.

      This only affects the creation of an initial segment from added
      documents. I haven't changed anything after that, eg how segments are
      merged.

      The basic ideas are:

      • Write stored fields and term vectors directly to disk (don't
        use up RAM for these).
      • Gather posting lists & term infos in RAM, but periodically do
        in-RAM merges. Once RAM is full, flush buffers to disk (and
        merge them later when it's time to make a real segment).
      • Recycle objects/buffers to reduce time/stress in GC.
      • Other various optimizations.

      Some of these changes are similar to how KinoSearch builds a segment.
      But, I haven't made any changes to Lucene's file format nor added
      requirements for a global fields schema.

      So far the only externally visible change is a new method
      "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
      deprecated) so that it flushes according to RAM usage and not a fixed
      number documents added.

      Attachments

        1. index.presharedstores.nocfs.zip
          5 kB
          Michael McCandless
        2. index.presharedstores.cfs.zip
          2 kB
          Michael McCandless
        3. LUCENE-843.take9.patch
          204 kB
          Michael McCandless
        4. LUCENE-843.take8.patch
          203 kB
          Michael McCandless
        5. LUCENE-843.take7.patch
          189 kB
          Michael McCandless
        6. LUCENE-843.take6.patch
          210 kB
          Michael McCandless
        7. LUCENE-843.take5.patch
          239 kB
          Michael McCandless
        8. LUCENE-843.take4.patch
          188 kB
          Michael McCandless
        9. LUCENE-843.take3.patch
          156 kB
          Michael McCandless
        10. LUCENE-843.take2.patch
          148 kB
          Michael McCandless
        11. LUCENE-843.patch
          141 kB
          Michael McCandless

        Issue Links

          Activity

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              5 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: