Lucene - Core
  1. Lucene - Core
  2. LUCENE-2655

Get deletes working in the realtime branch

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: Realtime Branch
    • Fix Version/s: Realtime Branch
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Deletes don't work anymore, a patch here will fix this.

      1. LUCENE-2655.patch
        4 kB
        Jason Rutherglen

        Issue Links

          Activity

          Show
          Jason Rutherglen added a comment - Here's a relevant comment from LUCENE-2324 : https://issues.apache.org/jira/browse/LUCENE-2324?focusedCommentId=12891256&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12891256
          Hide
          Jason Rutherglen added a comment -

          Maybe we could reuse (factor out) TermsHashPerField's
          custom hash here, for the buffered Terms? It efficiently maps a
          BytesRef --> int.

          I'm trying to get a feel for what we kind of deletes we want
          working in the flush by DWPT merge viz-a-viz the realtime branch
          (ie, the release where we have the new realtime search/indexing
          functionality).

          Factoring out BytesHash and storing the terms in a byte block
          pool will allow replacing the current hash map of terms and
          likely conserve more RAM. Will we need to replace doc id upto
          and instead use sequence ids?

          We can additionally, for now, implement flushing deletes on
          merges, like today, for the flush by DWPT merge to trunk. Then
          implement live, aka foreground deletes, for the realtime search
          branch merge to trunk.

          Show
          Jason Rutherglen added a comment - Maybe we could reuse (factor out) TermsHashPerField's custom hash here, for the buffered Terms? It efficiently maps a BytesRef --> int. I'm trying to get a feel for what we kind of deletes we want working in the flush by DWPT merge viz-a-viz the realtime branch (ie, the release where we have the new realtime search/indexing functionality). Factoring out BytesHash and storing the terms in a byte block pool will allow replacing the current hash map of terms and likely conserve more RAM. Will we need to replace doc id upto and instead use sequence ids? We can additionally, for now, implement flushing deletes on merges, like today, for the flush by DWPT merge to trunk. Then implement live, aka foreground deletes, for the realtime search branch merge to trunk.
          Hide
          Jason Rutherglen added a comment -

          A couple more things...

          The BytesHash or some other aptly named class can implement the
          quick sort of terms, again from TermsHashField, which will
          replace the sorted terms map used in the RT branch for deletes.
          The RT branch isn't yet calculating the RAM usage of deletes. By
          using the byte block pool, calculating the RAM usage will be
          trivial (as the the BBP automatically records the num bytes used).

          The RT branch has an implementation of delete using the min/max
          sequence ids for a given segment. What else is needed?

          Show
          Jason Rutherglen added a comment - A couple more things... The BytesHash or some other aptly named class can implement the quick sort of terms, again from TermsHashField, which will replace the sorted terms map used in the RT branch for deletes. The RT branch isn't yet calculating the RAM usage of deletes. By using the byte block pool, calculating the RAM usage will be trivial (as the the BBP automatically records the num bytes used). The RT branch has an implementation of delete using the min/max sequence ids for a given segment. What else is needed?
          Hide
          Jason Rutherglen added a comment -

          Here's a basic patch with a test case showing delete by term not working. It's simply not finding the docs in the reader in apply deletes, I'm guessing it's something basic that's wrong.

          Show
          Jason Rutherglen added a comment - Here's a basic patch with a test case showing delete by term not working. It's simply not finding the docs in the reader in apply deletes, I'm guessing it's something basic that's wrong.
          Hide
          Jason Rutherglen added a comment -

          Maybe I was using the new flex API wrongly, when I pass in the deleted docs to MultiFields.getTermDocsEnum, the test case passes.

          Show
          Jason Rutherglen added a comment - Maybe I was using the new flex API wrongly, when I pass in the deleted docs to MultiFields.getTermDocsEnum, the test case passes.
          Hide
          Jason Rutherglen added a comment -

          I'm only seeing this test from TestIndexWriterDelete fail:

          [junit] junit.framework.AssertionFailedError: expected:<3> but was:<0>
          [junit] at org.apache.lucene.index.TestIndexWriterDelete.testMaxBufferedDeletes(TestIndexWriterDelete.java:118)
          [junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)

          Show
          Jason Rutherglen added a comment - I'm only seeing this test from TestIndexWriterDelete fail: [junit] junit.framework.AssertionFailedError: expected:<3> but was:<0> [junit] at org.apache.lucene.index.TestIndexWriterDelete.testMaxBufferedDeletes(TestIndexWriterDelete.java:118) [junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)
          Hide
          Jason Rutherglen added a comment -

          There's this one as well, which I'll focus on, though it's an error in IR.isCurrent, it doesn't immediately appear to be related to deletes.

          [junit] Testsuite: org.apache.lucene.index.TestIndexWriterReader
          [junit] Testcase: testUpdateDocument(org.apache.lucene.index.TestIndexWriterReader):	FAILED
          [junit] null
          [junit] junit.framework.AssertionFailedError: null
          [junit] 	at org.apache.lucene.index.TestIndexWriterReader.testUpdateDocument(TestIndexWriterReader.java:104)
          [junit] 	at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)
          
          Show
          Jason Rutherglen added a comment - There's this one as well, which I'll focus on, though it's an error in IR.isCurrent, it doesn't immediately appear to be related to deletes. [junit] Testsuite: org.apache.lucene.index.TestIndexWriterReader [junit] Testcase: testUpdateDocument(org.apache.lucene.index.TestIndexWriterReader): FAILED [junit] null [junit] junit.framework.AssertionFailedError: null [junit] at org.apache.lucene.index.TestIndexWriterReader.testUpdateDocument(TestIndexWriterReader.java:104) [junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)
          Hide
          Michael McCandless added a comment -

          Stepping back here...

          There are two different goals mixed into the RT branch effort, I
          think:

          1. Make thread states fully independent so flushing is no longer sync'd (plus it's a nice simplification, eg no more *PerThread in the indexing chain)
          2. Enable direct searching on the thread states RAM buffer, for awesome
            NRT performance

          It seems to me like the first one is not so far off? Ie we nearly
          have it already (LUCENE-2324)... it's just that we don't have the
          deletes working?

          Whereas the 2nd one is a much bigger change, and still iterating under
          LUCENE-2475.

          Is it possible to decouple #1 from #2? Ie, bring it to a committable
          state and land it on trunk and let it bake some?

          Eg, on deletes, what if we simply have each thread state buffer its own
          delete term -> thread's docID, privately? We know this approach
          will "work" (it does today), right? It's just wasteful of RAM (though,
          cutover to BytesRefHash should help alot here), and, makes
          deletes somewhat slower since you must now enroll the del term
          into each thread state....

          It wouldn't actually be that wasteful of RAM, since the BytesRef instance
          would be shared across all the maps. Also, if we wanted (separately)
          we could make a more efficient buffer when the app deletes many terms
          at once, or, many calls to delete-by-single-term with no adds in between
          (much like how Amazon optimizes N one-click purchases in a row...).

          I really want to re-run my 6-thread indexing test on beast and see the
          indexing throughput double!!

          Show
          Michael McCandless added a comment - Stepping back here... There are two different goals mixed into the RT branch effort, I think: Make thread states fully independent so flushing is no longer sync'd (plus it's a nice simplification, eg no more *PerThread in the indexing chain) Enable direct searching on the thread states RAM buffer, for awesome NRT performance It seems to me like the first one is not so far off? Ie we nearly have it already ( LUCENE-2324 )... it's just that we don't have the deletes working? Whereas the 2nd one is a much bigger change, and still iterating under LUCENE-2475 . Is it possible to decouple #1 from #2? Ie, bring it to a committable state and land it on trunk and let it bake some? Eg, on deletes, what if we simply have each thread state buffer its own delete term -> thread's docID, privately? We know this approach will "work" (it does today), right? It's just wasteful of RAM (though, cutover to BytesRefHash should help alot here), and, makes deletes somewhat slower since you must now enroll the del term into each thread state.... It wouldn't actually be that wasteful of RAM, since the BytesRef instance would be shared across all the maps. Also, if we wanted (separately) we could make a more efficient buffer when the app deletes many terms at once, or, many calls to delete-by-single-term with no adds in between (much like how Amazon optimizes N one-click purchases in a row...). I really want to re-run my 6-thread indexing test on beast and see the indexing throughput double!!
          Hide
          Jason Rutherglen added a comment -

          Are you saying we should simply make deletes work (as is, no BytesRefHash conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes? I was thinking along those lines. I can spend time making the rest of the unit tests work on the existing RT revision, though should this should probably happen in conjunction with a merge from trunk.

          Or simply make the tests pass, and merge RT -> trunk afterwards?

          Also, from what I've seen, deletes seem to work, I'm not sure what exactly Michael is referring to. I'll run the full 'suite' of unit tests, and just make each work?

          Show
          Jason Rutherglen added a comment - Are you saying we should simply make deletes work (as is, no BytesRefHash conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes? I was thinking along those lines. I can spend time making the rest of the unit tests work on the existing RT revision, though should this should probably happen in conjunction with a merge from trunk. Or simply make the tests pass, and merge RT -> trunk afterwards? Also, from what I've seen, deletes seem to work, I'm not sure what exactly Michael is referring to. I'll run the full 'suite' of unit tests, and just make each work?
          Hide
          Michael McCandless added a comment -

          Are you saying we should simply make deletes work (as is, no BytesRefHash conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes?

          Yes! Just keep using the Map we now use, but now it must be per-DWPT. And when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab its current docID, and enroll Term/Query -> docID into that DWPT's pending deletes map.

          Also, from what I've seen, deletes seem to work, I'm not sure what exactly Michael is referring to.

          But this is using the long sequenceID right? (adds 8 bytes per buffered docID). I think we wanted to explore ways to reduce that? Or, if we can make it modal (you only spend these 8 bytes if IW is open in NRT mode), then that's OK?

          I was hoping to cleanly separate the DWPT cutover (solves the nasty flush bottleneck) from the NRT improvements.

          Show
          Michael McCandless added a comment - Are you saying we should simply make deletes work (as is, no BytesRefHash conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes? Yes! Just keep using the Map we now use, but now it must be per-DWPT. And when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab its current docID, and enroll Term/Query -> docID into that DWPT's pending deletes map. Also, from what I've seen, deletes seem to work, I'm not sure what exactly Michael is referring to. But this is using the long sequenceID right? (adds 8 bytes per buffered docID). I think we wanted to explore ways to reduce that? Or, if we can make it modal (you only spend these 8 bytes if IW is open in NRT mode), then that's OK? I was hoping to cleanly separate the DWPT cutover (solves the nasty flush bottleneck) from the NRT improvements.
          Hide
          Jason Rutherglen added a comment -

          when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab its current docID, and enroll Term/Query -> docID into that DWPT's pending deletes map.

          Ok, that's the change you're referring to. In the current RT revision, the deletes are held in one map in DW, guess we need to change that. However if we do, why do we need to keep the seq id or docid as the value in the map? When the delete arrives into the DWPT, we know that any buffered docs with that term/query need to be deleted on flush? (ie, lets not worry about the RT search use case, yet). ie2, we can simply add the terms/queries to a set, and apply them on flush, ala LUCENE-2679?

          NRT improvements

          We're referring to LUCENE-1516 as NRT and LUCENE-2312 as 'RT'. I'm guessing you mean RT?

          Show
          Jason Rutherglen added a comment - when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab its current docID, and enroll Term/Query -> docID into that DWPT's pending deletes map. Ok, that's the change you're referring to. In the current RT revision, the deletes are held in one map in DW, guess we need to change that. However if we do, why do we need to keep the seq id or docid as the value in the map? When the delete arrives into the DWPT, we know that any buffered docs with that term/query need to be deleted on flush? (ie, lets not worry about the RT search use case, yet). ie2, we can simply add the terms/queries to a set, and apply them on flush, ala LUCENE-2679 ? NRT improvements We're referring to LUCENE-1516 as NRT and LUCENE-2312 as 'RT'. I'm guessing you mean RT?
          Hide
          Michael McCandless added a comment -

          I'm guessing you mean RT?

          Duh sorry yes... I'll try to use the right name

          In the current RT revision, the deletes are held in one map in DW, guess we need to change that.

          Right, one map that maps the del term to the long sequence ID right? I'm thinking we revert just this part, and go back to how trunk how buffers deletes (maps to docID), but per-DWPT.

          However if we do, why do we need to keep the seq id or docid as the value in the map? When the delete arrives into the DWPT, we know that any buffered docs with that term/query need to be deleted on flush? (ie, lets not worry about the RT search use case, yet). ie2, we can simply add the terms/queries to a set, and apply them on flush, ala LUCENE-2679?

          For the interleaved case. Ie, LUCENE-2679 is optional – we still must handle the interleaved case correctly (and, I think, by default). But if an app uses the opto in LUCENE-2679 then we only need a single Set.

          Basically, the DWPT change, alone, is hugely valuable and (I think) decouple-able from the trickier/riskier/RAM-consuminger RT changes.

          Show
          Michael McCandless added a comment - I'm guessing you mean RT? Duh sorry yes... I'll try to use the right name In the current RT revision, the deletes are held in one map in DW, guess we need to change that. Right, one map that maps the del term to the long sequence ID right? I'm thinking we revert just this part, and go back to how trunk how buffers deletes (maps to docID), but per-DWPT. However if we do, why do we need to keep the seq id or docid as the value in the map? When the delete arrives into the DWPT, we know that any buffered docs with that term/query need to be deleted on flush? (ie, lets not worry about the RT search use case, yet). ie2, we can simply add the terms/queries to a set, and apply them on flush, ala LUCENE-2679 ? For the interleaved case. Ie, LUCENE-2679 is optional – we still must handle the interleaved case correctly (and, I think, by default). But if an app uses the opto in LUCENE-2679 then we only need a single Set. Basically, the DWPT change, alone, is hugely valuable and (I think) decouple-able from the trickier/riskier/RAM-consuminger RT changes.
          Hide
          Jason Rutherglen added a comment -

          go back to how trunk how buffers deletes (maps to docID), but per-DWPT.

          I don't think we need seq ids for the flush-by-DWPT merge to trunk. I was getting confused about the docid being a start from, rather than a stop at. I'll implement a map of query/term -> docid-upto per DWPT.

          Basically, the DWPT change, alone, is hugely valuable and (I think) decouple-able from the trickier/riskier/RAM-consuminger RT changes.

          Yes indeed!

          I'll try to use the right name

          NRT -> RT. The next one will be need to be either R or T, shall we decide now or later?

          Show
          Jason Rutherglen added a comment - go back to how trunk how buffers deletes (maps to docID), but per-DWPT. I don't think we need seq ids for the flush-by-DWPT merge to trunk. I was getting confused about the docid being a start from, rather than a stop at. I'll implement a map of query/term -> docid-upto per DWPT. Basically, the DWPT change, alone, is hugely valuable and (I think) decouple-able from the trickier/riskier/RAM-consuminger RT changes. Yes indeed! I'll try to use the right name NRT -> RT. The next one will be need to be either R or T, shall we decide now or later?
          Hide
          Jason Rutherglen added a comment -

          We've implied an additional change to the way deletes are flushed in that today, they're flushed in applyDeletes when segments are merged, however with flush-DWPT we're applying deletes after flushing the DWPT segment.

          Also we'll have a globalesque buffered deletes presumably located in IW that buffers deletes for the existing segments, and these should [as today] be applied only when segments are merged or getreader is called?

          Show
          Jason Rutherglen added a comment - We've implied an additional change to the way deletes are flushed in that today, they're flushed in applyDeletes when segments are merged, however with flush-DWPT we're applying deletes after flushing the DWPT segment. Also we'll have a globalesque buffered deletes presumably located in IW that buffers deletes for the existing segments, and these should [as today] be applied only when segments are merged or getreader is called?
          Hide
          Michael McCandless added a comment -

          We've implied an additional change to the way deletes are flushed in that today, they're flushed in applyDeletes when segments are merged, however with flush-DWPT we're applying deletes after flushing the DWPT segment.

          Why is that change needed again? Deferring until merge kickoff is a win, eg for a non N/R/T (heh) app, it means 1/10th the reader open cost (w/ default mergeFactor=10)? Opening/closing readers can be costly for a large index.

          Really, some day, we ought to only apply deletes to those segments about to be merged (and keep the buffer for the rest of the segments). Eg most merges are small... yet we'll pay huge cost opening that massive grandaddy segment every time these small merges kick off. But that's another issue...

          Why can't we just do what we do today? Ie push the DWPT buffered deletes into the flushed deletes?

          Show
          Michael McCandless added a comment - We've implied an additional change to the way deletes are flushed in that today, they're flushed in applyDeletes when segments are merged, however with flush-DWPT we're applying deletes after flushing the DWPT segment. Why is that change needed again? Deferring until merge kickoff is a win, eg for a non N/R/T (heh) app, it means 1/10th the reader open cost (w/ default mergeFactor=10)? Opening/closing readers can be costly for a large index. Really, some day, we ought to only apply deletes to those segments about to be merged (and keep the buffer for the rest of the segments). Eg most merges are small... yet we'll pay huge cost opening that massive grandaddy segment every time these small merges kick off. But that's another issue... Why can't we just do what we do today? Ie push the DWPT buffered deletes into the flushed deletes?
          Hide
          Jason Rutherglen added a comment -

          Ok, I have been stuck/excited about not having to use/understand the remap-docids method, because it's hard to debug. However I see what you're saying, and why remap-docids exists. I'll push the DWP buffered deletes to the flushed deletes.

          we'll pay huge cost opening that massive grandaddy segment

          This large cost is from loading the terms index and deleted docs? When those large segments are merged though, the IO cost is so substantial that loading tii or del into RAM probably doesn't account for much of the aggregate IO, they're probably in the noise? Or are you referring to the NRT apply deletes flush, however that is on a presumably pooled reader? Or you're just saying that today we're applying deletes across the board to all segments prior to a merge, regardless of whether or not they're even involved in the merge? It seems like that is changeable?

          Show
          Jason Rutherglen added a comment - Ok, I have been stuck/excited about not having to use/understand the remap-docids method, because it's hard to debug. However I see what you're saying, and why remap-docids exists. I'll push the DWP buffered deletes to the flushed deletes. we'll pay huge cost opening that massive grandaddy segment This large cost is from loading the terms index and deleted docs? When those large segments are merged though, the IO cost is so substantial that loading tii or del into RAM probably doesn't account for much of the aggregate IO, they're probably in the noise? Or are you referring to the NRT apply deletes flush, however that is on a presumably pooled reader? Or you're just saying that today we're applying deletes across the board to all segments prior to a merge, regardless of whether or not they're even involved in the merge? It seems like that is changeable?
          Hide
          Michael McCandless added a comment -

          Ok, I have been stuck/excited about not having to use/understand the remap-docids method, because it's hard to debug. However I see what you're saying, and why remap-docids exists. I'll push the DWP buffered deletes to the flushed deletes.

          I think we still must remap, at least on the pushed (deletesFlushed) deletes?

          On the buffered deletes for the DWPT (deletesInRAM), I think we can make these relative to the DWPT (ie start from 0), but on pushing them into flushed deletes we re-base them?

          This large cost is from loading the terms index and deleted docs?

          Yes. We don't (hopefully) load norms, field cache, etc.

          When those large segments are merged though, the IO cost is so substantial that loading tii or del into RAM probably doesn't account for much of the aggregate IO, they're probably in the noise?

          Well, the applyDeletes method is sync'd, vs merging which is fully concurrent. (Also, merging doesn't load the tii).

          Or are you referring to the NRT apply deletes flush, however that is on a presumably pooled reader?

          Right, it would be pooled for the NRT case, so this is only a (sizable) perf problem for the non-nrt case.

          Or you're just saying that today we're applying deletes across the board to all segments prior to a merge, regardless of whether or not they're even involved in the merge? It seems like that is changeable?

          Right! That's what we do today (apply deletes to all segs) whereas it's really only necessary to apply them to the segments being merged. I opened LUCENE-2680 to track this.

          Show
          Michael McCandless added a comment - Ok, I have been stuck/excited about not having to use/understand the remap-docids method, because it's hard to debug. However I see what you're saying, and why remap-docids exists. I'll push the DWP buffered deletes to the flushed deletes. I think we still must remap, at least on the pushed (deletesFlushed) deletes? On the buffered deletes for the DWPT (deletesInRAM), I think we can make these relative to the DWPT (ie start from 0), but on pushing them into flushed deletes we re-base them? This large cost is from loading the terms index and deleted docs? Yes. We don't (hopefully) load norms, field cache, etc. When those large segments are merged though, the IO cost is so substantial that loading tii or del into RAM probably doesn't account for much of the aggregate IO, they're probably in the noise? Well, the applyDeletes method is sync'd, vs merging which is fully concurrent. (Also, merging doesn't load the tii). Or are you referring to the NRT apply deletes flush, however that is on a presumably pooled reader? Right, it would be pooled for the NRT case, so this is only a (sizable) perf problem for the non-nrt case. Or you're just saying that today we're applying deletes across the board to all segments prior to a merge, regardless of whether or not they're even involved in the merge? It seems like that is changeable? Right! That's what we do today (apply deletes to all segs) whereas it's really only necessary to apply them to the segments being merged. I opened LUCENE-2680 to track this.
          Hide
          Jason Rutherglen added a comment -

          With per DWPT deletes, we're maintaining a docidupto/limit per term/query. When the DWPT deletes are transferred to an IW deletes flushed, there isn't a way to globalize the docupto/limit value that's been maintained per DWPT. Today we globalize the docidupto/limit value because we're only flushing one segment at a time.

          We need to implement per segment deletes queuing to make this patch work, basically implement LUCENE-2680. I'll start in this direction.

          Show
          Jason Rutherglen added a comment - With per DWPT deletes, we're maintaining a docidupto/limit per term/query. When the DWPT deletes are transferred to an IW deletes flushed, there isn't a way to globalize the docupto/limit value that's been maintained per DWPT. Today we globalize the docidupto/limit value because we're only flushing one segment at a time. We need to implement per segment deletes queuing to make this patch work, basically implement LUCENE-2680 . I'll start in this direction.
          Hide
          Michael McCandless added a comment -

          With per DWPT deletes, we're maintaining a docidupto/limit per term/query. When the DWPT deletes are transferred to an IW deletes flushed, there isn't a way to globalize the docupto/limit value that's been maintained per DWPT. Today we globalize the docidupto/limit value because we're only flushing one segment at a time.

          I think we can still globalize?

          The trick is it'd require doing the remapping in the same sync block that appends the new SegmentInfo into the SegmentInfos, right? Because it's a that moment that the newly flushed segment is assigned its position w/ the rest of the segments. And so, at that moment, it knows the sum of the maxDoc() of all prior segments, and it can globalize all of the docIDs mapped to the pending delete Term/Query?

          Show
          Michael McCandless added a comment - With per DWPT deletes, we're maintaining a docidupto/limit per term/query. When the DWPT deletes are transferred to an IW deletes flushed, there isn't a way to globalize the docupto/limit value that's been maintained per DWPT. Today we globalize the docidupto/limit value because we're only flushing one segment at a time. I think we can still globalize? The trick is it'd require doing the remapping in the same sync block that appends the new SegmentInfo into the SegmentInfos, right? Because it's a that moment that the newly flushed segment is assigned its position w/ the rest of the segments. And so, at that moment, it knows the sum of the maxDoc() of all prior segments, and it can globalize all of the docIDs mapped to the pending delete Term/Query?
          Hide
          Jason Rutherglen added a comment -

          When a new segment is flushed, then the docid-upto is reset to what's in the latest segment? Aren't we then losing point-in-timeness of the docid-uptos per DWPT by simply resetting dut to the highest value of the newest segment?

          Show
          Jason Rutherglen added a comment - When a new segment is flushed, then the docid-upto is reset to what's in the latest segment? Aren't we then losing point-in-timeness of the docid-uptos per DWPT by simply resetting dut to the highest value of the newest segment?
          Hide
          Michael McCandless added a comment -

          When a new segment is flushed, then the docid-upto is reset to what's in the latest segment?

          Do you mean the flushedDocCount (in DocumentsWriter), by "docid-upto"? Ie, this is what we use to "globalize" the docid-upto stored for each delete Term/Query.

          Aren't we then losing point-in-timeness of the docid-uptos per DWPT by simply resetting dut to the highest value of the newest segment?

          I'm confused... that flushedDocCount is just an int, tracking the total doc count of all already-flushed segments. This shouldn't affect point-in-timeness....?

          Show
          Michael McCandless added a comment - When a new segment is flushed, then the docid-upto is reset to what's in the latest segment? Do you mean the flushedDocCount (in DocumentsWriter), by "docid-upto"? Ie, this is what we use to "globalize" the docid-upto stored for each delete Term/Query. Aren't we then losing point-in-timeness of the docid-uptos per DWPT by simply resetting dut to the highest value of the newest segment? I'm confused... that flushedDocCount is just an int, tracking the total doc count of all already-flushed segments. This shouldn't affect point-in-timeness....?
          Hide
          Jason Rutherglen added a comment -

          If there are 2 DWPTs, and each has a the following pending deletes:

          DWPT #1 maxdoc: 50; term id:1 limit:25

          DWPT #2 maxdoc: 45; term id:1 limit:30

          Where limit is the docid-upto stored as an int value in the pending deletes map. If we flush one of these DWPTs without first applying the deletes to a bitvector, and we flush the DWPTs in order, then DWPT #1 would flush the deletes, the term doc limit would be reset to segmentinfos maxdoc + 25. However there' a 50 - 25 = 25 gap. When DWPT #2 is flushed, the limit for term id:1 is set to segmentinfos maxdoc + 50 + 30. eg, we're effectively losing DWPT #1s limit (26 - 50)?

          Show
          Jason Rutherglen added a comment - If there are 2 DWPTs, and each has a the following pending deletes: DWPT #1 maxdoc: 50; term id:1 limit:25 DWPT #2 maxdoc: 45; term id:1 limit:30 Where limit is the docid-upto stored as an int value in the pending deletes map. If we flush one of these DWPTs without first applying the deletes to a bitvector, and we flush the DWPTs in order, then DWPT #1 would flush the deletes, the term doc limit would be reset to segmentinfos maxdoc + 25. However there' a 50 - 25 = 25 gap. When DWPT #2 is flushed, the limit for term id:1 is set to segmentinfos maxdoc + 50 + 30. eg, we're effectively losing DWPT #1s limit (26 - 50)?
          Hide
          Michael McCandless added a comment -

          Just to confirm: when a delete is done, we go and buffer that delete into each DWPT, right? (Mapped to the then-current docid-upto for that DWPT).

          OK I see the problem. It's because we now have a single pool for deletesFlushed, right? Ie, DWPT #2 will overwrite the term id:1 entry.

          But, I think the switch to generations of pending deletes (LUCENE-2680) would fix this? Maybe we should go do that one first...

          Ie, DWPT #1's flush would enter a new gen delete pool (maps term -> global docid-upto). Then DWPT #2's flush would also enter a new gen delete pool. Hmm, but not quite... the generations can't simply stack on top of one another. I think there's a graph structure somehow? Ie every DWPT that's flushed must record the segments that existed (were already flushed) when it was first created, because it's only those segments that should get the deleted term. Segments in the future obviously shouldn't get it. And segments from parallel DWPTs (ie that existed at the same time) should also not get it since they will separately track the deleted term.

          BTW, I think this makes LUCENE-2679 all the more important (the ability to delete such that delete will only apply to already-committed segments), since this'd mean we only store the pending delete in a single map instead of map per DWPT.

          Show
          Michael McCandless added a comment - Just to confirm: when a delete is done, we go and buffer that delete into each DWPT, right? (Mapped to the then-current docid-upto for that DWPT). OK I see the problem. It's because we now have a single pool for deletesFlushed, right? Ie, DWPT #2 will overwrite the term id:1 entry. But, I think the switch to generations of pending deletes ( LUCENE-2680 ) would fix this? Maybe we should go do that one first... Ie, DWPT #1's flush would enter a new gen delete pool (maps term -> global docid-upto). Then DWPT #2's flush would also enter a new gen delete pool. Hmm, but not quite... the generations can't simply stack on top of one another. I think there's a graph structure somehow? Ie every DWPT that's flushed must record the segments that existed (were already flushed) when it was first created, because it's only those segments that should get the deleted term. Segments in the future obviously shouldn't get it. And segments from parallel DWPTs (ie that existed at the same time) should also not get it since they will separately track the deleted term. BTW, I think this makes LUCENE-2679 all the more important (the ability to delete such that delete will only apply to already-committed segments), since this'd mean we only store the pending delete in a single map instead of map per DWPT.
          Hide
          Jason Rutherglen added a comment -

          I think the switch to generations of pending deletes (LUCENE-2680) would fix this? Maybe we should go do that one first.

          Yes, lets work on that one as it directly relates to solving this issue. The generational system won't work for DWPT pending flushed deletes for the reasons described (ie, the docidupto/limit will be inappropriately globalized).

          I think there's a graph structure somehow? Ie every DWPT that's flushed must record the segments that existed (were already flushed) when it was first created, because it's only those segments that should get the deleted term

          You're saying record a list of segments that existed at the time of flushing a DWPT's deletes? Lets get that data structure mapped out to start on LUCENE-2680?

          Show
          Jason Rutherglen added a comment - I think the switch to generations of pending deletes ( LUCENE-2680 ) would fix this? Maybe we should go do that one first. Yes, lets work on that one as it directly relates to solving this issue. The generational system won't work for DWPT pending flushed deletes for the reasons described (ie, the docidupto/limit will be inappropriately globalized). I think there's a graph structure somehow? Ie every DWPT that's flushed must record the segments that existed (were already flushed) when it was first created, because it's only those segments that should get the deleted term You're saying record a list of segments that existed at the time of flushing a DWPT's deletes? Lets get that data structure mapped out to start on LUCENE-2680 ?
          Hide
          Michael McCandless added a comment -

          You're saying record a list of segments that existed at the time of flushing a DWPT's deletes?

          Actually, I think it's simpler! I think the DWPT just records the index of the last segment in the index, as of when it is created (or re-inited after it's been flushed).

          On flush of a given DWPT, its buffered deletes are recorded against that segment, and still carry over the lastSegmentIndex. This way, when we finally do resolve these deletes to docIDs, we 1) always apply the delete if segment <= lastSegmentIndex, or 2) the doc is in that segment and is <= the docID upto. I think this'd mean we can keep the docid-upto as local docIDs, which is nice (no globalizing/shifting-on-merge/flush needed).

          So, segments flushed in the current IW session will carry this private pool of pending deletes. But, pre-existing segments in the index don't need their own pool. Instead, when it's time to resolve the buffered deletes against them (because they are about to be merged), they must walk all of the per-segment pools, resolving the deletes from that pool if its segment index is <= the lastSegmentIndex of that pool. We should take care to efficiently handle dup'd terms, ie where the same del term is present in multiple pools. The most recent one "wins", and we should do only one delete (per segment) for that term.

          These per-segment delete pools must be updated on merge. EG if the lastSegmentIndex of a pool gets merged, that's fine, but then on merge commit we must move that lastSegmentIndex "backwards" to the last segment before the merge, because any deletes necessary within the segment will have been resolved already.

          When segments with del pools are merged, we obviously apply the deletes to the segments being merged, but, then, we have to coalesce those pools and move them into a single pool on the segment just before the merge. We could actually use a Set at this point since there is no more docid-upto for this pool (ie, it applies to all docs on that segment and in segments prior to it).

          So I think this is much simpler than I first thought!

          Lets get that data structure mapped out to start on LUCENE-2680?

          +1

          Show
          Michael McCandless added a comment - You're saying record a list of segments that existed at the time of flushing a DWPT's deletes? Actually, I think it's simpler! I think the DWPT just records the index of the last segment in the index, as of when it is created (or re-inited after it's been flushed). On flush of a given DWPT, its buffered deletes are recorded against that segment, and still carry over the lastSegmentIndex. This way, when we finally do resolve these deletes to docIDs, we 1) always apply the delete if segment <= lastSegmentIndex, or 2) the doc is in that segment and is <= the docID upto. I think this'd mean we can keep the docid-upto as local docIDs, which is nice (no globalizing/shifting-on-merge/flush needed). So, segments flushed in the current IW session will carry this private pool of pending deletes. But, pre-existing segments in the index don't need their own pool. Instead, when it's time to resolve the buffered deletes against them (because they are about to be merged), they must walk all of the per-segment pools, resolving the deletes from that pool if its segment index is <= the lastSegmentIndex of that pool. We should take care to efficiently handle dup'd terms, ie where the same del term is present in multiple pools. The most recent one "wins", and we should do only one delete (per segment) for that term. These per-segment delete pools must be updated on merge. EG if the lastSegmentIndex of a pool gets merged, that's fine, but then on merge commit we must move that lastSegmentIndex "backwards" to the last segment before the merge, because any deletes necessary within the segment will have been resolved already. When segments with del pools are merged, we obviously apply the deletes to the segments being merged, but, then, we have to coalesce those pools and move them into a single pool on the segment just before the merge. We could actually use a Set at this point since there is no more docid-upto for this pool (ie, it applies to all docs on that segment and in segments prior to it). So I think this is much simpler than I first thought! Lets get that data structure mapped out to start on LUCENE-2680 ? +1
          Hide
          Jason Rutherglen added a comment -

          Can you explain what you mean by "per-segment pools" compared with the flushed deletes data structure in there today, ie, why are they pools?

          I can start on the simplified version of this for the non-DWPT use case, eg, LUCENE-2680. That version'll implement the delete pool per segment only for those segments created after the last flush/commit?

          Show
          Jason Rutherglen added a comment - Can you explain what you mean by "per-segment pools" compared with the flushed deletes data structure in there today, ie, why are they pools? I can start on the simplified version of this for the non-DWPT use case, eg, LUCENE-2680 . That version'll implement the delete pool per segment only for those segments created after the last flush/commit?
          Hide
          Michael McCandless added a comment -

          Can you explain what you mean by "per-segment pools" compared with the flushed deletes data structure in there today, ie, why are they pools?

          Sorry, by pool I mean what we have today. Ie a map from del term/query -> global docid-upto.

          Except, with this change it will be per-segment, only for those segments flushed since IW was opened or since last commit.

          I can start on the simplified version of this for the non-DWPT use case, eg, LUCENE-2680.

          Yes, please!

          This approach works fine for the single-DWPT case (ie what we now have). It'll be good to get it first working on this simpler case, then port to the RT branch to get it working for multiple DWPTs.

          That version'll implement the delete pool per segment only for those segments created after the last flush/commit?

          Created after IW was opened or after last commit, yes.

          I think we'll need an applyAllDeletes (called by commit) vs an applyDeletesToSegment (called by merge).

          Show
          Michael McCandless added a comment - Can you explain what you mean by "per-segment pools" compared with the flushed deletes data structure in there today, ie, why are they pools? Sorry, by pool I mean what we have today. Ie a map from del term/query -> global docid-upto. Except, with this change it will be per-segment, only for those segments flushed since IW was opened or since last commit. I can start on the simplified version of this for the non-DWPT use case, eg, LUCENE-2680 . Yes, please! This approach works fine for the single-DWPT case (ie what we now have). It'll be good to get it first working on this simpler case, then port to the RT branch to get it working for multiple DWPTs. That version'll implement the delete pool per segment only for those segments created after the last flush/commit? Created after IW was opened or after last commit, yes. I think we'll need an applyAllDeletes (called by commit) vs an applyDeletesToSegment (called by merge).
          Hide
          Jason Rutherglen added a comment -

          Per-segment deletes were implemented as a part of LUCENE-2680.

          Show
          Jason Rutherglen added a comment - Per-segment deletes were implemented as a part of LUCENE-2680 .

            People

            • Assignee:
              Unassigned
              Reporter:
              Jason Rutherglen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development