Lucene - Core
  1. Lucene - Core
  2. LUCENE-843

improve how IndexWriter uses RAM to buffer added documents

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.3
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I'm working on a new class (MultiDocumentWriter) that writes more than
      one document directly into a single Lucene segment, more efficiently
      than the current approach.

      This only affects the creation of an initial segment from added
      documents. I haven't changed anything after that, eg how segments are
      merged.

      The basic ideas are:

      • Write stored fields and term vectors directly to disk (don't
        use up RAM for these).
      • Gather posting lists & term infos in RAM, but periodically do
        in-RAM merges. Once RAM is full, flush buffers to disk (and
        merge them later when it's time to make a real segment).
      • Recycle objects/buffers to reduce time/stress in GC.
      • Other various optimizations.

      Some of these changes are similar to how KinoSearch builds a segment.
      But, I haven't made any changes to Lucene's file format nor added
      requirements for a global fields schema.

      So far the only externally visible change is a new method
      "setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
      deprecated) so that it flushes according to RAM usage and not a fixed
      number documents added.

      1. index.presharedstores.cfs.zip
        2 kB
        Michael McCandless
      2. index.presharedstores.nocfs.zip
        5 kB
        Michael McCandless
      3. LUCENE-843.patch
        141 kB
        Michael McCandless
      4. LUCENE-843.take2.patch
        148 kB
        Michael McCandless
      5. LUCENE-843.take3.patch
        156 kB
        Michael McCandless
      6. LUCENE-843.take4.patch
        188 kB
        Michael McCandless
      7. LUCENE-843.take5.patch
        239 kB
        Michael McCandless
      8. LUCENE-843.take6.patch
        210 kB
        Michael McCandless
      9. LUCENE-843.take7.patch
        189 kB
        Michael McCandless
      10. LUCENE-843.take8.patch
        203 kB
        Michael McCandless
      11. LUCENE-843.take9.patch
        204 kB
        Michael McCandless

        Issue Links

          Activity

          Michael McCandless created issue -
          Michael McCandless made changes -
          Field Original Value New Value
          Status Open [ 1 ] In Progress [ 3 ]
          Michael McCandless made changes -
          Attachment LUCENE-843.patch [ 12353973 ]
          Michael McCandless made changes -
          Attachment LUCENE-843.take2.patch [ 12354163 ]
          Michael McCandless made changes -
          Attachment LUCENE-843.take3.patch [ 12354431 ]
          Michael McCandless made changes -
          Attachment LUCENE-843.take4.patch [ 12354752 ]
          Michael McCandless made changes -
          Attachment LUCENE-843.take5.patch [ 12356500 ]
          Michael McCandless made changes -
          Link This issue is blocked by LUCENE-845 [ LUCENE-845 ]
          Michael McCandless made changes -
          Attachment LUCENE-843.take6.patch [ 12357792 ]
          Michael McCandless made changes -
          Attachment LUCENE-843.take7.patch [ 12359276 ]
          Michael McCandless made changes -
          Attachment LUCENE-843.take8.patch [ 12359906 ]
          Michael McCandless made changes -
          Attachment LUCENE-843.take9.patch [ 12360022 ]
          Michael McCandless made changes -
          Attachment index.presharedstores.cfs.zip [ 12360213 ]
          Attachment index.presharedstores.nocfs.zip [ 12360214 ]
          Michael McCandless made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Fix Version/s 2.3 [ 12312531 ]
          Resolution Fixed [ 1 ]
          Lucene Fields [New] [New, Patch Available]
          Michael McCandless made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Lucene Fields [Patch Available, New] [New, Patch Available]
          Michael McCandless made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Lucene Fields [Patch Available, New] [New, Patch Available]
          Resolution Fixed [ 1 ]
          Grant Ingersoll made changes -
          Link This issue is related to SOLR-342 [ SOLR-342 ]
          Michael Busch made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Mark Thomas made changes -
          Workflow jira [ 12400220 ] Default workflow, editable Closed status [ 12564574 ]
          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12564574 ] jira [ 12584981 ]

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              5 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development