Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1737

Always use bulk-copy when merging stored fields and term vectors

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Lucene has nice optimizations in place during merging of stored fields
      (LUCENE-1043) and term vectors (LUCENE-1120) whereby the bytes are
      bulk copied to the new segmetn. This is much faster than decoding &
      rewriting one document at a time.

      However the optimization is rather brittle: it relies on the mapping
      of field name to number to be the same ("congruent") for the segment
      being merged.

      Unfortunately, the field mapping will be congruent only if the app
      adds the same fields in precisely the same order to each document.

      I think we should fix IndexWriter to assign the same field number for
      a given field that has been assigned in the past. Ie, when writing a
      new segment, we pre-seed the field numbers based on past segments.
      All other aspects of FieldInfo would remain fully dynamic.

        Attachments

        1. LUCENE-1737.patch
          16 kB
          Michael McCandless
        2. LUCENE-1737.patch
          4 kB
          Michael McCandless

          Issue Links

            Activity

              People

              • Assignee:
                mikemccand Michael McCandless
                Reporter:
                mikemccand Michael McCandless
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: