Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2680

Improve how IndexWriter flushes deletes against existing segments

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.1, 4.0-ALPHA
    • None
    • None
    • New

    Description

      IndexWriter buffers up all deletes (by Term and Query) and only
      applies them if 1) commit or NRT getReader() is called, or 2) a merge
      is about to kickoff.

      We do this because, for a large index, it's very costly to open a
      SegmentReader for every segment in the index. So we defer as long as
      we can. We do it just before merge so that the merge can eliminate
      the deleted docs.

      But, most merges are small, yet in a big index we apply deletes to all
      of the segments, which is really very wasteful.

      Instead, we should only apply the buffered deletes to the segments
      that are about to be merged, and keep the buffer around for the
      remaining segments.

      I think it's not so hard to do; we'd have to have generations of
      pending deletions, because the newly merged segment doesn't need the
      same buffered deletions applied again. So every time a merge kicks
      off, we pinch off the current set of buffered deletions, open a new
      set (the next generation), and record which segment was created as of
      which generation.

      This should be a very sizable gain for large indices that mix
      deletes, though, less so in flex since opening the terms index is much
      faster.

      Attachments

        1. LUCENE-2680.patch
          141 kB
          Michael McCandless
        2. LUCENE-2680.patch
          124 kB
          Michael McCandless
        3. LUCENE-2680.patch
          124 kB
          Michael McCandless
        4. LUCENE-2680.patch
          57 kB
          Jason Rutherglen
        5. LUCENE-2680.patch
          56 kB
          Jason Rutherglen
        6. LUCENE-2680.patch
          19 kB
          Jason Rutherglen
        7. LUCENE-2680.patch
          19 kB
          Jason Rutherglen
        8. LUCENE-2680.patch
          9 kB
          Jason Rutherglen
        9. LUCENE-2680.patch
          42 kB
          Jason Rutherglen
        10. LUCENE-2680.patch
          44 kB
          Jason Rutherglen
        11. LUCENE-2680.patch
          43 kB
          Jason Rutherglen
        12. LUCENE-2680.patch
          41 kB
          Jason Rutherglen
        13. LUCENE-2680.patch
          42 kB
          Jason Rutherglen
        14. LUCENE-2680.patch
          45 kB
          Jason Rutherglen
        15. LUCENE-2680.patch
          37 kB
          Jason Rutherglen
        16. LUCENE-2680.patch
          30 kB
          Jason Rutherglen
        17. LUCENE-2680.patch
          33 kB
          Jason Rutherglen

        Issue Links

          Activity

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: