Lucene - Core
  1. Lucene - Core
  2. LUCENE-2770

Optimize SegmentMerger to work on atomic (Segment)Readers where possible

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This is a spin-off from LUCENE-2769:

      Currently SegmentMerger has some optimizations when it merges segments that are SegmentReaders (e.g. when doing normal indexing or optimizing). But when you do IndexWriter.addIndexes(IndexReader...) the listed IndexReaders may not really be per-segment. SegmentMerger should track down all passed in reads down to the lowest level (Segment)Reader (or other atomic readers like SlowMultiReaderWrapper) and then merge. We can then remove most MultiFields usage (except term merging itsself) and clean up the code.

      This especially saves lots of memory for merging norms, as no longer the duplicate norms arrays are created when MultiReaders are used!

      1. LUCENE-2770.patch
        11 kB
        Uwe Schindler
      2. LUCENE-2770.patch
        9 kB
        Uwe Schindler
      3. LUCENE-2770.patch
        8 kB
        Uwe Schindler
      4. LUCENE-2770-3x.patch
        7 kB
        Uwe Schindler
      5. LUCENE-2770-3x.patch
        3 kB
        Uwe Schindler
      6. LUCENE-2770-optimizeNormsMerging.patch
        2 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Here the patch for 3.x, really easy here!

          3.x was missing to return null in contrib/misc for getSequentialSubReaders. Further optimizations here could also work per segment. For now we have to keep them "atomic".

          As a side note: Why is contrib/misc's IndexSorter missing in trunk?

          Show
          Uwe Schindler added a comment - Here the patch for 3.x, really easy here! 3.x was missing to return null in contrib/misc for getSequentialSubReaders. Further optimizations here could also work per segment. For now we have to keep them "atomic". As a side note: Why is contrib/misc's IndexSorter missing in trunk?
          Hide
          Uwe Schindler added a comment -

          Updated patch for trunk, else MultiPassIndexSplitter fails when source index has more then one segment.

          Show
          Uwe Schindler added a comment - Updated patch for trunk, else MultiPassIndexSplitter fails when source index has more then one segment.
          Hide
          Uwe Schindler added a comment -

          Final patches, will commit soon.

          The changes in MultiPassIndexSplitter are not yet really needed, but this is because FilterIndexReader is broken in trunk (see related issue). But I add it here, because for consistency we need it.

          This patch also removes the useless closeReaders() method in SegmentMerger. This was used by tests only (incorrectly) and should never be called from production code.

          Show
          Uwe Schindler added a comment - Final patches, will commit soon. The changes in MultiPassIndexSplitter are not yet really needed, but this is because FilterIndexReader is broken in trunk (see related issue). But I add it here, because for consistency we need it. This patch also removes the useless closeReaders() method in SegmentMerger. This was used by tests only (incorrectly) and should never be called from production code.
          Hide
          Michael McCandless added a comment -

          Looks great Uwe!

          Show
          Michael McCandless added a comment - Looks great Uwe!
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision: 1036970
          Committed 3.x revision: 1036971

          Thanks Mike for the help with the stupid bug because of not-cloned TVReader!

          Show
          Uwe Schindler added a comment - Committed trunk revision: 1036970 Committed 3.x revision: 1036971 Thanks Mike for the help with the stupid bug because of not-cloned TVReader!
          Hide
          Uwe Schindler added a comment -

          After a discussion with Robert, I supply a new patch with optrimized norms merging without reallocating and ArrayUtil.oversize() (implied y BytesRef). The fix is now to simply look before starting to merge norms how big the normsBuffer must be. So there is never reallocation needed.

          Will heavy commit now

          Show
          Uwe Schindler added a comment - After a discussion with Robert, I supply a new patch with optrimized norms merging without reallocating and ArrayUtil.oversize() (implied y BytesRef). The fix is now to simply look before starting to merge norms how big the normsBuffer must be. So there is never reallocation needed. Will heavy commit now
          Hide
          Uwe Schindler added a comment -

          Committed optimization in trunk revision: 1037077

          Show
          Uwe Schindler added a comment - Committed optimization in trunk revision: 1037077
          Hide
          Uwe Schindler added a comment -

          Backported optimization to 3.x revision: 1037083

          Show
          Uwe Schindler added a comment - Backported optimization to 3.x revision: 1037083
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development