Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.4.1
    • Fix Version/s: 4.9, 5.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We'll merge the field caches in RAM as the SegmentReader's are
      merged in IndexWriter (the first cut will work in conjunction
      with IW.getReader). There will be an optional callback to
      determine which fields to merge.

      1. LUCENE-1785.patch
        22 kB
        Jason Rutherglen
      2. LUCENE-1785.patch
        22 kB
        Jason Rutherglen
      3. LUCENE-1785.patch
        18 kB
        Jason Rutherglen
      4. LUCENE-1785.patch
        18 kB
        Jason Rutherglen

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Move issue to Lucene 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Lucene 4.9.
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Jason Rutherglen added a comment -

          couldn't we pull a Term(s)Enum from the newly merged segment for that field

          Yes, I think this makes the most sense, and won't adversely affect performance because term dictionary file access is sequential.

          Show
          Jason Rutherglen added a comment - couldn't we pull a Term(s)Enum from the newly merged segment for that field Yes, I think this makes the most sense, and won't adversely affect performance because term dictionary file access is sequential.
          Hide
          Michael McCandless added a comment -

          Make that... eliminate the dead terms.

          Show
          Michael McCandless added a comment - Make that... eliminate the dead terms.
          Hide
          Michael McCandless added a comment -

          Hmm good question. Actually, couldn't we pull a Term(s)Enum from the newly merged segment for that field, and use it to eliminate the non-dead terms?

          Show
          Michael McCandless added a comment - Hmm good question. Actually, couldn't we pull a Term(s)Enum from the newly merged segment for that field, and use it to eliminate the non-dead terms?
          Hide
          Jason Rutherglen added a comment -

          Can't we just merge-sort into the merged StringIndex?

          Right however what's going to be the fastest way to remove terms that are no longer in the index?

          Show
          Jason Rutherglen added a comment - Can't we just merge-sort into the merged StringIndex? Right however what's going to be the fastest way to remove terms that are no longer in the index?
          Hide
          Michael McCandless added a comment -

          Can't we just merge-sort into the merged StringIndex?

          Show
          Michael McCandless added a comment - Can't we just merge-sort into the merged StringIndex?
          Hide
          Jason Rutherglen added a comment -

          We probably need to figure out a way to merge string indexes before committing this? Is there an efficient way to do this?

          Show
          Jason Rutherglen added a comment - We probably need to figure out a way to merge string indexes before committing this? Is there an efficient way to do this?
          Hide
          Jason Rutherglen added a comment -

          Cleaned up some more, moved mergeSuccess to mergeMiddle, as otherwise the merge cloned readers had already been released by the time mergeSuccess was reached.

          Show
          Jason Rutherglen added a comment - Cleaned up some more, moved mergeSuccess to mergeMiddle, as otherwise the merge cloned readers had already been released by the time mergeSuccess was reached.
          Hide
          Jason Rutherglen added a comment -

          Deletes in the source readers should be handled correctly.

          We probably need a unit test that verifies the merged caches are exactly what they should be.

          Show
          Jason Rutherglen added a comment - Deletes in the source readers should be handled correctly. We probably need a unit test that verifies the merged caches are exactly what they should be.
          Hide
          Jason Rutherglen added a comment -

          The sanity check is fixed by skipping the entry with a null
          value. Removed some of the debugging. I think this patch
          requires a way to handle merging field caches from segments that
          have not yet created their field caches. We can generate the new
          cache if we already have at least 75% of the required caches.

          Also we need to handle deletes.

          Show
          Jason Rutherglen added a comment - The sanity check is fixed by skipping the entry with a null value. Removed some of the debugging. I think this patch requires a way to handle merging field caches from segments that have not yet created their field caches. We can generate the new cache if we already have at least 75% of the required caches. Also we need to handle deletes.
          Hide
          Jason Rutherglen added a comment -

          This mostly works, not committable. I've noticed we're creating
          multiple cache keys (i.e. Entry objects), one with the default
          parser, one with a null parser, that point to the same
          underlying value.

          The field cache merging then tries to merge both of these
          entries into separate objects, causing the field cache sanity
          check to fail. I'm guessing I need to find values that are the
          same for an entry and choose one (the one with a parser?).

          Note: This only works when using IW.getReader

          Show
          Jason Rutherglen added a comment - This mostly works, not committable. I've noticed we're creating multiple cache keys (i.e. Entry objects), one with the default parser, one with a null parser, that point to the same underlying value. The field cache merging then tries to merge both of these entries into separate objects, causing the field cache sanity check to fail. I'm guessing I need to find values that are the same for an entry and choose one (the one with a parser?). Note: This only works when using IW.getReader
          Hide
          Mark Miller added a comment -

          I think this might have to be 3.1 ...

          Show
          Mark Miller added a comment - I think this might have to be 3.1 ...

            People

            • Assignee:
              Unassigned
              Reporter:
              Jason Rutherglen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development