Lucene - Core
  1. Lucene - Core
  2. LUCENE-1076

Allow MergePolicy to select non-contiguous merges

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I started work on this but with LUCENE-1044 I won't make much progress
      on it for a while, so I want to checkpoint my current state/patch.

      For backwards compatibility we must leave the default MergePolicy as
      selecting contiguous merges. This is necessary because some
      applications rely on "temporal monotonicity" of doc IDs, which means
      even though merges can re-number documents, the renumbering will
      always reflect the order in which the documents were added to the
      index.

      Still, for those apps that do not rely on this, we should offer a
      MergePolicy that is free to select the best merges regardless of
      whether they are continuguous. This requires fixing IndexWriter to
      accept such a merge, and, fixing LogMergePolicy to optionally allow
      it the freedom to do so.

      1. LUCENE-1076.patch
        8 kB
        Michael McCandless
      2. LUCENE-1076.patch
        132 kB
        Michael McCandless
      3. LUCENE-1076.patch
        112 kB
        Michael McCandless
      4. LUCENE-1076-3x.patch
        53 kB
        Shai Erera
      5. LUCENE-1076.patch
        10 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Attached my current patch. It compiles, but, quite a few tests fail!

        Show
        Michael McCandless added a comment - Attached my current patch. It compiles, but, quite a few tests fail!
        Hide
        Shai Erera added a comment -

        So Mike - just to clarify. If I have 3 segments: A (0-52), B (53-124) and C (125-145), and you decide to merge A and C, what will be the new doc IDs of all segments? will they start from 53? or will you shift all the documents so that the segments will be B (0-71) and A+C (72-145)?

        Show
        Shai Erera added a comment - So Mike - just to clarify. If I have 3 segments: A (0-52), B (53-124) and C (125-145), and you decide to merge A and C, what will be the new doc IDs of all segments? will they start from 53? or will you shift all the documents so that the segments will be B (0-71) and A+C (72-145)?
        Hide
        Michael McCandless added a comment -

        Well... one option might be "the newly merged segment always replaces the leftmost segment". Another option could be to leave it undefined, ie IW makes no commitment as to where it will place the newly merged segment so you should not rely on it. Presumably apps that rely on Lucene's internal doc ID to "mean something" would not use a merge policy that selects non-contiguous segments.

        Unfortunately, with the current index format, there's a big cost to allowing non-contiguous segments to be merged: it means the doc stores will always be merged. Whereas, today, if you build up a large new index, no merging is done for the doc stores.

        If we someday allowed a single segment to reference multiple original doc stores (logically concatenating [possibly many] slices out of them), which would presumably be a perf hit when retrieving the stored doc or term vectors, then this cost would go away.

        Show
        Michael McCandless added a comment - Well... one option might be "the newly merged segment always replaces the leftmost segment". Another option could be to leave it undefined, ie IW makes no commitment as to where it will place the newly merged segment so you should not rely on it. Presumably apps that rely on Lucene's internal doc ID to "mean something" would not use a merge policy that selects non-contiguous segments. Unfortunately, with the current index format, there's a big cost to allowing non-contiguous segments to be merged: it means the doc stores will always be merged. Whereas, today, if you build up a large new index, no merging is done for the doc stores. If we someday allowed a single segment to reference multiple original doc stores (logically concatenating [possibly many] slices out of them), which would presumably be a perf hit when retrieving the stored doc or term vectors, then this cost would go away.
        Hide
        Shai Erera added a comment -

        Well ... what I was thinking of is that even if the app does not care about internal doc IDs, the Lucene code may very well care. If we don't shift doc IDs back, it means maxDoc will continue to grow, and at some point (extreme case though), maxDoc will equal 1M, while there will be just 50K docs in the index.

        AFAIU, maxDoc is used today to determine array length in FieldCache, I've seen it used in IndexSearcher to sort the sub readers (at least in the past) etc. So perhaps alongside maxDoc we'll need to keep a curNumDocs member to track the actual number of documents?

        But I have a feeling this will also get complicated.

        Show
        Shai Erera added a comment - Well ... what I was thinking of is that even if the app does not care about internal doc IDs, the Lucene code may very well care. If we don't shift doc IDs back, it means maxDoc will continue to grow, and at some point (extreme case though), maxDoc will equal 1M, while there will be just 50K docs in the index. AFAIU, maxDoc is used today to determine array length in FieldCache, I've seen it used in IndexSearcher to sort the sub readers (at least in the past) etc. So perhaps alongside maxDoc we'll need to keep a curNumDocs member to track the actual number of documents? But I have a feeling this will also get complicated.
        Hide
        Michael McCandless added a comment -

        maxDoc() is computed by simply summing docCount of all segments in the index; it shouldn't ever grow.

        Show
        Michael McCandless added a comment - maxDoc() is computed by simply summing docCount of all segments in the index; it shouldn't ever grow.
        Hide
        Shai Erera added a comment -

        But how is a new doc ID allocated? Not by calling maxDoc()? So if maxDoc() = 50K, but there is a document w/ ID 1M (and possibly another one w/ 50K), won't that be a problem?

        Show
        Shai Erera added a comment - But how is a new doc ID allocated? Not by calling maxDoc()? So if maxDoc() = 50K, but there is a document w/ ID 1M (and possibly another one w/ 50K), won't that be a problem?
        Hide
        Shai Erera added a comment -

        Besides Mike, there's something I don't understand from a previous comment you've made: You commented that today if I build a large index, the doc stores are not merged, while if we'll move to merging non contiguous segments, they will. I'm afraid I'm not familiar with this area of Lucene well – if I merge two consecutive segments, how come I don't merge their doc stores?

        Show
        Shai Erera added a comment - Besides Mike, there's something I don't understand from a previous comment you've made: You commented that today if I build a large index, the doc stores are not merged, while if we'll move to merging non contiguous segments, they will. I'm afraid I'm not familiar with this area of Lucene well – if I merge two consecutive segments, how come I don't merge their doc stores?
        Hide
        Michael McCandless added a comment -

        But how is a new doc ID allocated?

        Doc IDs are logically assigned by summing docCount of all segments before me, as my base, and then adding to the "index" of the doc within my segment. Ie, the base of a given segment is not stored anywhere, so we are always free to shuffle up the order of segments and nothing in Lucene should care (but, the app might).

        Show
        Michael McCandless added a comment - But how is a new doc ID allocated? Doc IDs are logically assigned by summing docCount of all segments before me, as my base, and then adding to the "index" of the doc within my segment. Ie, the base of a given segment is not stored anywhere, so we are always free to shuffle up the order of segments and nothing in Lucene should care (but, the app might).
        Hide
        Michael McCandless added a comment -

        if I merge two consecutive segments, how come I don't merge their doc stores

        Multiple segments are able to share a single set of doc-store (=
        stored fields & term vectors) files, today. This only happens when
        multiple segments are written in a single IndexWriter session with
        autoCommit=false.

        EG if I open a writer, index all of wikipedia w/ autoCommit false, and
        close it, you'll see a single large set of doc store files (eg _0.fdt,
        _0.fdx, _0.tvf, _0.tvd, _0.tvx).

        Whenever RAM is full (with postings & norms data), a new segment is
        flushed, but the doc store files are kept open & shared with further
        flushed segments.

        A single segment then refers to the shared doc stores, but records its
        "offset" within them.

        So, when we merge contiguous segments, because the resulting docs are
        also contiguous in the doc stores, we are able to store a single doc
        store offset in the merged segment, referencing the orignial doc
        store, and it works fine.

        But if we merge non-contiguous segments, we must then pull out & merge
        the "slices" from the doc stores into a new [private to the new
        segment] set of doc store files.

        For apps that store term vectors w/ positions & offsets, and have many
        stored fields, and have heterogenous field name -> number assignments
        (see LUCENE-1737 to fix that), the merging of doc stores can easily
        dominate the merge cost.

        Show
        Michael McCandless added a comment - if I merge two consecutive segments, how come I don't merge their doc stores Multiple segments are able to share a single set of doc-store (= stored fields & term vectors) files, today. This only happens when multiple segments are written in a single IndexWriter session with autoCommit=false. EG if I open a writer, index all of wikipedia w/ autoCommit false, and close it, you'll see a single large set of doc store files (eg _0.fdt, _0.fdx, _0.tvf, _0.tvd, _0.tvx). Whenever RAM is full (with postings & norms data), a new segment is flushed, but the doc store files are kept open & shared with further flushed segments. A single segment then refers to the shared doc stores, but records its "offset" within them. So, when we merge contiguous segments, because the resulting docs are also contiguous in the doc stores, we are able to store a single doc store offset in the merged segment, referencing the orignial doc store, and it works fine. But if we merge non-contiguous segments, we must then pull out & merge the "slices" from the doc stores into a new [private to the new segment] set of doc store files. For apps that store term vectors w/ positions & offsets, and have many stored fields, and have heterogenous field name -> number assignments (see LUCENE-1737 to fix that), the merging of doc stores can easily dominate the merge cost.
        Hide
        Tim Smith added a comment -

        i suppose you could do a preliminary round of merging that would merge together segments that share doc store/termvector data

        once this preliminary round of merging is done, you would then no longer have the need to slice the doc stores up, just merge them together (contiguous or non-contiguous wouldn't matter anymore, however if a "segmented session" still exists higher up, this would prevent you from selecting these segments, or newer segments)

        it might even be desirable to have a "commit()" optionally perform this merging prior to the commit finishing as this will result in each commit producing one segment, regardless of the number of flushes that were done under the hood

        Show
        Tim Smith added a comment - i suppose you could do a preliminary round of merging that would merge together segments that share doc store/termvector data once this preliminary round of merging is done, you would then no longer have the need to slice the doc stores up, just merge them together (contiguous or non-contiguous wouldn't matter anymore, however if a "segmented session" still exists higher up, this would prevent you from selecting these segments, or newer segments) it might even be desirable to have a "commit()" optionally perform this merging prior to the commit finishing as this will result in each commit producing one segment, regardless of the number of flushes that were done under the hood
        Hide
        Jason Rutherglen added a comment -

        if I merge two consecutive segments, how come I don't
        merge their doc stores?

        You may want to take a look at
        SegmentInfo.docStoreOffset/docStoreSegment which is the pointer
        to the docstore file data for that SI.

        Show
        Jason Rutherglen added a comment - if I merge two consecutive segments, how come I don't merge their doc stores? You may want to take a look at SegmentInfo.docStoreOffset/docStoreSegment which is the pointer to the docstore file data for that SI.
        Hide
        Shai Erera added a comment -

        Thanks for the education everyone.

        Mike - it feels to me, even though I can't pin point it at the moment (FieldCache maybe?), that if maxDoc won't reflect the number of documents in the index we'll run into troubles. Therefore I suggest you consider introducing another numDocs() method which returns the actual number of documents there are in the index.

        Show
        Shai Erera added a comment - Thanks for the education everyone. Mike - it feels to me, even though I can't pin point it at the moment (FieldCache maybe?), that if maxDoc won't reflect the number of documents in the index we'll run into troubles. Therefore I suggest you consider introducing another numDocs() method which returns the actual number of documents there are in the index.
        Hide
        Michael McCandless added a comment -

        maxDoc() does reflect the number of docs in the index. It's simply the sum of docCount for all segments. Shuffling the order of the segments, or allowing non-contiguous segments to be merged, won't change how maxDoc() is computed.

        New docIDs are allocating by incrementing an integer (starting with 0) for the buffered docs. When a segment gets flushed, we reset that to 0. Ie, docIDs are stored within one segment; they have no "context" from prior segments.

        Show
        Michael McCandless added a comment - maxDoc() does reflect the number of docs in the index. It's simply the sum of docCount for all segments. Shuffling the order of the segments, or allowing non-contiguous segments to be merged, won't change how maxDoc() is computed. New docIDs are allocating by incrementing an integer (starting with 0) for the buffered docs. When a segment gets flushed, we reset that to 0. Ie, docIDs are stored within one segment; they have no "context" from prior segments.
        Hide
        Shai Erera added a comment -

        Oh. Thanks for correcting me. In that case, I take what I said back.

        I think this together w/ LUCENE-1750 can really help speed up segment merges in certain scenarios. Will wait for you to come back to it

        Show
        Shai Erera added a comment - Oh. Thanks for correcting me. In that case, I take what I said back. I think this together w/ LUCENE-1750 can really help speed up segment merges in certain scenarios. Will wait for you to come back to it
        Hide
        Michael McCandless added a comment -

        Will wait for you to come back to it

        Feel free to take it, too

        I think LUCENE-1737 is also very important for speeding up merging, especially because it's so "unexpected" that just by adding different fields to your docs, or the same fields in different orders, can so severely impact merge performance.

        Show
        Michael McCandless added a comment - Will wait for you to come back to it Feel free to take it, too I think LUCENE-1737 is also very important for speeding up merging, especially because it's so "unexpected" that just by adding different fields to your docs, or the same fields in different orders, can so severely impact merge performance.
        Hide
        Michael McCandless added a comment -

        Unassigning myself.

        Show
        Michael McCandless added a comment - Unassigning myself.
        Hide
        Shai Erera added a comment -

        Feel free to take it, too

        I don't mind to take a stab at it. But this doesn't mean you can unassign yourself. I'll need someone to commit it .

        Show
        Shai Erera added a comment - Feel free to take it, too I don't mind to take a stab at it. But this doesn't mean you can unassign yourself. I'll need someone to commit it .
        Hide
        Shai Erera added a comment -

        Can someone please help me understand what's going on here? After I applied the patch to trunk, TestIndexWriter.testOptimizeMaxNumSegments2() fails. The failure happens only if CMS is used, and doesn't when SMS is used. I dug deeper into the test and what happens is that the test asks to optimize(maxNumSegments) and expects that either: (1) if the number of segments was < maxNumSegments than the resulting number of segments is exactly as it was before and (2) otherwise it should be exactly maxNumSegments.

        First, the javadocs of optimize(maxNumSegments) say that it will result in <= maxNumSegments, but I understand the LogMergePolicy ensures that if you ask for maxNumSegments, that's the number of segments you'll get.

        While trying to debug what's wrong w/ the change so far, I managed to reduce the test to this code:

        public void test1() throws Exception {
            MockRAMDirectory dir = new MockRAMDirectory();
        
            final Document doc = new Document();
            doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED));
        
            IndexWriter writer  = new IndexWriter(dir, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
        //    writer.setMergeScheduler(new SerialMergeScheduler());
            LogDocMergePolicy ldmp = new LogDocMergePolicy();
            ldmp.setMinMergeDocs(1);
            writer.setMergePolicy(ldmp);
            writer.setMergeFactor(3);
            writer.setMaxBufferedDocs(2);
        
            MergeScheduler ms = writer.getMergeScheduler();
        //  writer.setInfoStream(System.out);
            
            // Add enough documents to create several segments (uncomitted) and kick off
            // some threads.
            for (int i = 0; i < 20; i++) {
              writer.addDocument(doc);
            }
            writer.commit();
            
            if (ms instanceof ConcurrentMergeScheduler) {
              // Wait for all merges to complete
              ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync();
            }
            
            SegmentInfos sis = new SegmentInfos();
            sis.read(dir);
            
            System.out.println("numSegments after add + commit ==> " + sis.size());
            
            final int segCount = sis.size();
            
            int maxNumSegments = 3;
            writer.optimize(maxNumSegments);
            writer.commit();
            
            if (ms instanceof ConcurrentMergeScheduler) {
              // Wait for all merges to complete
              ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync();
            }
            
            sis = new SegmentInfos();
            sis.read(dir);
            final int optSegCount = sis.size();
            
            System.out.println("numSegments after optimize (" + maxNumSegments + ") + commit ==> " + sis.size());
            
            if (segCount < maxNumSegments)
              Assert.assertEquals(segCount, optSegCount);
            else
              Assert.assertEquals(maxNumSegments, optSegCount);
        }
        

        This fails almost every time that I run it, so if you try it - make sure to run it a couple of times. I then switched to trunk, but it fails almost consistently on trunk also !?!?

        Can someone please have a look and tell me what's wrong (is it the test, or did I hit a true bug in the code?)?

        Show
        Shai Erera added a comment - Can someone please help me understand what's going on here? After I applied the patch to trunk, TestIndexWriter.testOptimizeMaxNumSegments2() fails. The failure happens only if CMS is used, and doesn't when SMS is used. I dug deeper into the test and what happens is that the test asks to optimize(maxNumSegments) and expects that either: (1) if the number of segments was < maxNumSegments than the resulting number of segments is exactly as it was before and (2) otherwise it should be exactly maxNumSegments. First, the javadocs of optimize(maxNumSegments) say that it will result in <= maxNumSegments, but I understand the LogMergePolicy ensures that if you ask for maxNumSegments, that's the number of segments you'll get. While trying to debug what's wrong w/ the change so far, I managed to reduce the test to this code: public void test1() throws Exception { MockRAMDirectory dir = new MockRAMDirectory(); final Document doc = new Document(); doc.add( new Field( "content" , "aaa" , Field.Store.YES, Field.Index.ANALYZED)); IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true , IndexWriter.MaxFieldLength.LIMITED); // writer.setMergeScheduler( new SerialMergeScheduler()); LogDocMergePolicy ldmp = new LogDocMergePolicy(); ldmp.setMinMergeDocs(1); writer.setMergePolicy(ldmp); writer.setMergeFactor(3); writer.setMaxBufferedDocs(2); MergeScheduler ms = writer.getMergeScheduler(); // writer.setInfoStream( System .out); // Add enough documents to create several segments (uncomitted) and kick off // some threads. for ( int i = 0; i < 20; i++) { writer.addDocument(doc); } writer.commit(); if (ms instanceof ConcurrentMergeScheduler) { // Wait for all merges to complete ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync(); } SegmentInfos sis = new SegmentInfos(); sis.read(dir); System .out.println( "numSegments after add + commit ==> " + sis.size()); final int segCount = sis.size(); int maxNumSegments = 3; writer.optimize(maxNumSegments); writer.commit(); if (ms instanceof ConcurrentMergeScheduler) { // Wait for all merges to complete ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync(); } sis = new SegmentInfos(); sis.read(dir); final int optSegCount = sis.size(); System .out.println( "numSegments after optimize (" + maxNumSegments + ") + commit ==> " + sis.size()); if (segCount < maxNumSegments) Assert.assertEquals(segCount, optSegCount); else Assert.assertEquals(maxNumSegments, optSegCount); } This fails almost every time that I run it, so if you try it - make sure to run it a couple of times. I then switched to trunk, but it fails almost consistently on trunk also !?!? Can someone please have a look and tell me what's wrong (is it the test, or did I hit a true bug in the code?)?
        Hide
        Shai Erera added a comment -

        Hmm ... I think I found the problem. I added commit() after cms.sync() and it never failed again. So I checked the output of infoStream in two cases (failure and success) and found this: in the success case, the pending merges occurred before the last addDocument calls happened (actually the last 2), therefore commit() committed those pending merges output, and sync() afterwards did nothing.

        In the failure case, the last pending merge happened after commit() was called, either as (or not) part of the sync() call, but it was never committed.

        So it looks to me that I should add this test case as testOptimizeMaxNumSegments3() (even though it has nothing to do w/ optimize()), just to cover this aspect and also document CMS.sync() to mention that a commit() after it is required if the outcome of the merges should be reflected in the index (i.e., committed).

        Did I get it right?

        Show
        Shai Erera added a comment - Hmm ... I think I found the problem. I added commit() after cms.sync() and it never failed again. So I checked the output of infoStream in two cases (failure and success) and found this: in the success case, the pending merges occurred before the last addDocument calls happened (actually the last 2), therefore commit() committed those pending merges output, and sync() afterwards did nothing. In the failure case, the last pending merge happened after commit() was called, either as (or not) part of the sync() call, but it was never committed. So it looks to me that I should add this test case as testOptimizeMaxNumSegments3() (even though it has nothing to do w/ optimize()), just to cover this aspect and also document CMS.sync() to mention that a commit() after it is required if the outcome of the merges should be reflected in the index (i.e., committed). Did I get it right?
        Hide
        Shai Erera added a comment -

        BTW, the second sync() call comes after optimize(), which is redundant as far as I understand, since optimize() or optimize(int) will wait for all merges to complete, which CMS merges.

        I wonder then if it won't be useful to have a commit(doWait=true), which won't require calling sync() or waitForMerges()?

        Show
        Shai Erera added a comment - BTW, the second sync() call comes after optimize(), which is redundant as far as I understand, since optimize() or optimize(int) will wait for all merges to complete, which CMS merges. I wonder then if it won't be useful to have a commit(doWait=true), which won't require calling sync() or waitForMerges()?
        Hide
        Michael McCandless added a comment -

        I added commit() after cms.sync() and it never failed again.

        I think you are right! Since we try to read the dir (sis.read), if we don't commit then the changes won't be present since IndexWriter is opened w/ autoCommit false. Another simple check would be to call getReader().getSequentialSubReaders() and check how many segments there are, instead of having to go through the Directory to check it.

        BTW, the second sync() call comes after optimize(), which is redundant as far as I understand

        I agree.

        I wonder then if it won't be useful to have a commit(doWait=true), which won't require calling sync() or waitForMerges()?

        I think we can leave this separate (ie you should call waitForMerges() if you need to), because commit normally has nothing to do w/ merging, since merging doesn't change any docs in the index. Commit only ensure that changes to the index are pushed to stable storage. Whereas eg optimize is all about doing merges so it makes sense for it to have a doWait?

        Show
        Michael McCandless added a comment - I added commit() after cms.sync() and it never failed again. I think you are right! Since we try to read the dir (sis.read), if we don't commit then the changes won't be present since IndexWriter is opened w/ autoCommit false. Another simple check would be to call getReader().getSequentialSubReaders() and check how many segments there are, instead of having to go through the Directory to check it. BTW, the second sync() call comes after optimize(), which is redundant as far as I understand I agree. I wonder then if it won't be useful to have a commit(doWait=true), which won't require calling sync() or waitForMerges()? I think we can leave this separate (ie you should call waitForMerges() if you need to), because commit normally has nothing to do w/ merging, since merging doesn't change any docs in the index. Commit only ensure that changes to the index are pushed to stable storage. Whereas eg optimize is all about doing merges so it makes sense for it to have a doWait?
        Hide
        Shai Erera added a comment -

        Ok I agree that commit should not wait for merges. It does seem not related to segment merging.

        Show
        Shai Erera added a comment - Ok I agree that commit should not wait for merges. It does seem not related to segment merging.
        Hide
        Michael McCandless added a comment -

        I'll take a crack at this. It's compelling, now that we always bulk merge doc stores...

        Show
        Michael McCandless added a comment - I'll take a crack at this. It's compelling, now that we always bulk merge doc stores...
        Hide
        Michael McCandless added a comment -

        Patch. I think it's ready to commit...

        To stress test ooo merging, I made a fun new MockRandomMergePolicy
        (swapped in half the time by LTC.newIndexWriterConfig) which randomly
        decides when to do a merge, then randomly picks how many segments to
        merge, and then randomly picks which ones. Also, I modified
        LogMergePolicy to add a boolean get/setRequireContiguousMerge,
        defaulting to false.

        Many tests rely on in-order docIDs during merging, so I had to wire
        them to use in-order LogMP.

        I also reworked how buffered deletes are managed, so that each
        "packet" of buffered deletes, as well as each flushed segment, is now
        assigned an incrementing gen. This way, when it's time to apply
        deletes, the algorithm is easy: only delete packets with gen >= this
        segment should coalesce and apply.

        Separately, eventually, I'd like to switch to a better default MP,
        something like BSMP where immense merges are done w/ small mergeFactor
        (eg, 2), and tiny merges are done w/ large mergeFactor.

        Show
        Michael McCandless added a comment - Patch. I think it's ready to commit... To stress test ooo merging, I made a fun new MockRandomMergePolicy (swapped in half the time by LTC.newIndexWriterConfig) which randomly decides when to do a merge, then randomly picks how many segments to merge, and then randomly picks which ones. Also, I modified LogMergePolicy to add a boolean get/setRequireContiguousMerge, defaulting to false. Many tests rely on in-order docIDs during merging, so I had to wire them to use in-order LogMP. I also reworked how buffered deletes are managed, so that each "packet" of buffered deletes, as well as each flushed segment, is now assigned an incrementing gen. This way, when it's time to apply deletes, the algorithm is easy: only delete packets with gen >= this segment should coalesce and apply. Separately, eventually, I'd like to switch to a better default MP, something like BSMP where immense merges are done w/ small mergeFactor (eg, 2), and tiny merges are done w/ large mergeFactor.
        Hide
        Jason Rutherglen added a comment -

        I also reworked how buffered deletes are managed, so that each
        "packet" of buffered deletes, as well as each flushed segment, is now
        assigned an incrementing gen. This way, when it's time to apply
        deletes, the algorithm is easy: only delete packets with gen >= this
        segment should coalesce and apply.

        I saw this and thought it was interesting. Why is the gen needed?

        Show
        Jason Rutherglen added a comment - I also reworked how buffered deletes are managed, so that each "packet" of buffered deletes, as well as each flushed segment, is now assigned an incrementing gen. This way, when it's time to apply deletes, the algorithm is easy: only delete packets with gen >= this segment should coalesce and apply. I saw this and thought it was interesting. Why is the gen needed?
        Hide
        Michael McCandless added a comment -

        I saw this and thought it was interesting. Why is the gen needed?

        So, at first I added it because the pushing of merged delete packets
        got too hairy, eg when merges interleave you'd have to handle deletes
        being pushed onto each other's internal merged segments.

        Also, we really needed a transactional data structure here, because
        before DW could push more deletes into an existing packet (ie the
        packet was not write once), which made tracking problematic if the
        merge wanted to record that the first batch of deletes had been
        applied but not any subsequent pushes.

        But, after making the change, I realized that today (trunk, 3.1) we
        are badly inefficient! We apply deletes to segments being merged, but
        then we place the merged segment back in the same position. This is
        inefficient because later when this segment gets merged, we wastefully
        re-apply the same deletes (plus, new ones, which do need to be
        applied). This is a total waste.

        So, by decoupling tracking of where you are in the deletes packet
        stream, from the physical location of your segment in the index, we
        fix this waste. Also, it's quite a bit simpler now – we no longer
        have to merge deletes on completing a merge.

        Show
        Michael McCandless added a comment - I saw this and thought it was interesting. Why is the gen needed? So, at first I added it because the pushing of merged delete packets got too hairy, eg when merges interleave you'd have to handle deletes being pushed onto each other's internal merged segments. Also, we really needed a transactional data structure here, because before DW could push more deletes into an existing packet (ie the packet was not write once), which made tracking problematic if the merge wanted to record that the first batch of deletes had been applied but not any subsequent pushes. But, after making the change, I realized that today (trunk, 3.1) we are badly inefficient! We apply deletes to segments being merged, but then we place the merged segment back in the same position. This is inefficient because later when this segment gets merged, we wastefully re-apply the same deletes (plus, new ones, which do need to be applied). This is a total waste. So, by decoupling tracking of where you are in the deletes packet stream, from the physical location of your segment in the index, we fix this waste. Also, it's quite a bit simpler now – we no longer have to merge deletes on completing a merge.
        Hide
        Michael McCandless added a comment -

        Another benefit of the transaction log for deletes is, because they are write-once (ie, after a set of buffered deletes is pushed, they are never changed), we can switch to a more efficient data structure than TreeMap on push.

        Eg, we can pull the del Terms (sorted) and store them in an array.

        Show
        Michael McCandless added a comment - Another benefit of the transaction log for deletes is, because they are write-once (ie, after a set of buffered deletes is pushed, they are never changed), we can switch to a more efficient data structure than TreeMap on push. Eg, we can pull the del Terms (sorted) and store them in an array.
        Hide
        Michael McCandless added a comment -

        Committed to trunk; I'll let this age for a while before back porting.

        On the backport, I'll leave the default contiguous merges.

        Show
        Michael McCandless added a comment - Committed to trunk; I'll let this age for a while before back porting. On the backport, I'll leave the default contiguous merges.
        Hide
        Michael McCandless added a comment -

        I think we can do this for 3.1, but I'll leave the default as always doing contiguous merges.

        Show
        Michael McCandless added a comment - I think we can do this for 3.1, but I'll leave the default as always doing contiguous merges.
        Hide
        Michael McCandless added a comment -

        I changed my mind! Pushing this to 3.2 now.

        Show
        Michael McCandless added a comment - I changed my mind! Pushing this to 3.2 now.
        Hide
        Michael McCandless added a comment -

        Phew, this patch almost fell below the event horizon of my TODO list...

        I'm attaching new modernized one. I also mod'd the policy to not select two max-sized merges at once. I think it's ready to commit...

        Show
        Michael McCandless added a comment - Phew, this patch almost fell below the event horizon of my TODO list... I'm attaching new modernized one. I also mod'd the policy to not select two max-sized merges at once. I think it's ready to commit...
        Hide
        Shai Erera added a comment -

        Patch against 3x. This is not ready to commit yet, as many tests fail on exceptions like this:

            [junit] java.lang.IndexOutOfBoundsException
            [junit]     at java.util.AbstractList.subList(AbstractList.java:763)
            [junit]     at java.util.Vector.subList(Vector.java:975)
            [junit]     at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3550)
            [junit]     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4057)
            [junit]     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3631)
        

        Mike says there was an earlier commit (handled how deletes are flushed) that is a dependency of that, and that I can continue only he back-ports that.

        In the meantime, I've fixed tests that assumed LogMP (for setting compound and mergeFactor) by adding LTC.setUseCompoundFile and LTC.setMergeFactor as utility methods.

        Will continue after Mike back-ports the dependencies.

        Show
        Shai Erera added a comment - Patch against 3x. This is not ready to commit yet, as many tests fail on exceptions like this: [junit] java.lang.IndexOutOfBoundsException [junit] at java.util.AbstractList.subList(AbstractList.java:763) [junit] at java.util.Vector.subList(Vector.java:975) [junit] at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3550) [junit] at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4057) [junit] at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3631) Mike says there was an earlier commit (handled how deletes are flushed) that is a dependency of that, and that I can continue only he back-ports that. In the meantime, I've fixed tests that assumed LogMP (for setting compound and mergeFactor) by adding LTC.setUseCompoundFile and LTC.setMergeFactor as utility methods. Will continue after Mike back-ports the dependencies.
        Hide
        Michael McCandless added a comment -

        Thanks Shai!

        Show
        Michael McCandless added a comment - Thanks Shai!
        Hide
        Michael McCandless added a comment -

        I think we should change 3.2's default MP to TieredMP?

        This means docIDs may be reordered, since Tiered MP can merge out-of-order segments.

        Show
        Michael McCandless added a comment - I think we should change 3.2's default MP to TieredMP? This means docIDs may be reordered, since Tiered MP can merge out-of-order segments.
        Hide
        Robert Muir added a comment -

        +1 Mike

        Show
        Robert Muir added a comment - +1 Mike
        Hide
        Simon Willnauer added a comment -

        This means docIDs may be reordered, since Tiered MP can merge out-of-order segments.

        I think this is a very hard break and it should depend on the Version you pass to IWC. Stuff like that is really a good usecase for Version. I had customers in the past that heavily depend on the lucene doc ID while it is not recommended but with this change their app will suddenly not work anymore. so we should make sure that they can upgrade seamlessly!

        Show
        Simon Willnauer added a comment - This means docIDs may be reordered, since Tiered MP can merge out-of-order segments. I think this is a very hard break and it should depend on the Version you pass to IWC. Stuff like that is really a good usecase for Version. I had customers in the past that heavily depend on the lucene doc ID while it is not recommended but with this change their app will suddenly not work anymore. so we should make sure that they can upgrade seamlessly!
        Hide
        Uwe Schindler added a comment - - edited

        This means docIDs may be reordered, since Tiered MP can merge out-of-order segments.

        I think this is a very hard break and it should depend on the Version you pass to IWC. Stuff like that is really a good usecase for Version. I had customers in the past that heavily depend on the lucene doc ID while it is not recommended but with this change their app will suddenly not work anymore. so we should make sure that they can upgrade seamlessly!

        I think we should also warn people that have this problem to use IndexUpgrader (LUCENE-3082), because it has the same problem. Segments are reordered (segments that were upgraded before a call to MP's optimize come first, then the upgraded ones). Maybe we should add this to JavaDocs in 3.x.

        I'll reopen LUCENE-3082.

        Show
        Uwe Schindler added a comment - - edited This means docIDs may be reordered, since Tiered MP can merge out-of-order segments. I think this is a very hard break and it should depend on the Version you pass to IWC. Stuff like that is really a good usecase for Version. I had customers in the past that heavily depend on the lucene doc ID while it is not recommended but with this change their app will suddenly not work anymore. so we should make sure that they can upgrade seamlessly! I think we should also warn people that have this problem to use IndexUpgrader ( LUCENE-3082 ), because it has the same problem. Segments are reordered (segments that were upgraded before a call to MP's optimize come first, then the upgraded ones). Maybe we should add this to JavaDocs in 3.x. I'll reopen LUCENE-3082 .
        Hide
        Michael McCandless added a comment -

        I think this is a very hard break and it should depend on the Version you pass to IWC.

        +1

        I'll make it TieredMP if version >= 3.2, else LogByteSizeMP.

        Show
        Michael McCandless added a comment - I think this is a very hard break and it should depend on the Version you pass to IWC. +1 I'll make it TieredMP if version >= 3.2, else LogByteSizeMP.
        Hide
        Michael McCandless added a comment -

        Patch against 3.x, changing default if matchVersion is >= LUCENE_32 passed to IWC.

        Show
        Michael McCandless added a comment - Patch against 3.x, changing default if matchVersion is >= LUCENE_32 passed to IWC.
        Hide
        Robert Muir added a comment -

        Bulk closing for 3.2

        Show
        Robert Muir added a comment - Bulk closing for 3.2

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development