Lucene - Core
  1. Lucene - Core
  2. LUCENE-5116

IW.addIndexes doesn't prune all deleted segments

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      at the least, this can easily create segments with maxDoc == 0.

      It seems buggy: elsewhere we prune these segments out, so its expected to have a commit point with no segments rather than a segment with 0 documents...

      1. LUCENE-5116_test.patch
        0.8 kB
        Robert Muir
      2. LUCENE-5116.patch
        4 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        +1. This should be an easy and nice improvement. There's no need to spend the work adding those segments just so they are dropped on the next merge.

        Show
        Shai Erera added a comment - +1. This should be an easy and nice improvement. There's no need to spend the work adding those segments just so they are dropped on the next merge.
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Shai Erera added a comment -

        I fixed IW.addIndexes to drop empty segments and Rob's test passes. However I fail to reproduce the case where you add a segment with all documents deleted. No matter what I do – call IW.deleteDocuments(new MADQ()) or IW.deleteDocuments(new TermQuery()) – basically a query that matches all the documents in a segment, the segment is dropped on the next commit. Is it even possible to add a segment with all documents deleted?

        The fix is general (compares segment.numDocs > 0), so it will capture those cases too ... only I currently fail to reproduce one. Any ideas how I can repro it?

        Show
        Shai Erera added a comment - I fixed IW.addIndexes to drop empty segments and Rob's test passes. However I fail to reproduce the case where you add a segment with all documents deleted. No matter what I do – call IW.deleteDocuments(new MADQ()) or IW.deleteDocuments(new TermQuery()) – basically a query that matches all the documents in a segment, the segment is dropped on the next commit. Is it even possible to add a segment with all documents deleted? The fix is general (compares segment.numDocs > 0), so it will capture those cases too ... only I currently fail to reproduce one. Any ideas how I can repro it?
        Hide
        Robert Muir added a comment -

        so maybe only the empty segment special case exists? If you have an all-deleted one, it should have already been pruned... I think

        But we should be able to fake one with a FilterReader if we really want...

        Show
        Robert Muir added a comment - so maybe only the empty segment special case exists? If you have an all-deleted one, it should have already been pruned... I think But we should be able to fake one with a FilterReader if we really want...
        Hide
        Shai Erera added a comment -

        Patch adds tests for empty index and all deleted (fake) segment (thanks Rob!) and fixes IW.addIndexes.

        Rob and I chatted about adding a check to CheckIndex, asserting there are no empty segments. But I'm not sure we should, because technically it's not an error. I.e. you could do new IW().commit() and it's a perfectly valid index.

        This issue is more an optimization I guess than a bug – don't do wasted work on segments that will be dropped.

        Show
        Shai Erera added a comment - Patch adds tests for empty index and all deleted (fake) segment (thanks Rob!) and fixes IW.addIndexes. Rob and I chatted about adding a check to CheckIndex, asserting there are no empty segments. But I'm not sure we should, because technically it's not an error. I.e. you could do new IW().commit() and it's a perfectly valid index. This issue is more an optimization I guess than a bug – don't do wasted work on segments that will be dropped.
        Hide
        Robert Muir added a comment -

        asserting there are no empty segments. But I'm not sure we should, because technically it's not an error. I.e. you could do new IW().commit() and it's a perfectly valid index.

        Wait: I still think we need to. Ill open an issue and take care of it.

        The difference is: new IW().commit() should make a "empty commit" (segments_N with no segments). Not a 0 document segment...

        Show
        Robert Muir added a comment - asserting there are no empty segments. But I'm not sure we should, because technically it's not an error. I.e. you could do new IW().commit() and it's a perfectly valid index. Wait: I still think we need to. Ill open an issue and take care of it. The difference is: new IW().commit() should make a "empty commit" (segments_N with no segments). Not a 0 document segment...
        Hide
        Shai Erera added a comment -

        Oh, I guess you're right. If you're going to resolve this in a new issue, I think this one is ready to go. I'll add a CHANGES entry.

        Show
        Shai Erera added a comment - Oh, I guess you're right. If you're going to resolve this in a new issue, I think this one is ready to go. I'll add a CHANGES entry.
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Robert Muir added a comment -

        +1 to this patch Shai... thanks!

        Show
        Robert Muir added a comment - +1 to this patch Shai... thanks!
        Hide
        ASF subversion and git services added a comment -

        Commit 1504860 from Shai Erera in branch 'dev/trunk'
        [ https://svn.apache.org/r1504860 ]

        LUCENE-5116: IW.addIndexes doesn't prune all deleted segments

        Show
        ASF subversion and git services added a comment - Commit 1504860 from Shai Erera in branch 'dev/trunk' [ https://svn.apache.org/r1504860 ] LUCENE-5116 : IW.addIndexes doesn't prune all deleted segments
        Hide
        ASF subversion and git services added a comment -

        Commit 1504867 from Shai Erera in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1504867 ]

        LUCENE-5116: IW.addIndexes doesn't prune all deleted segments

        Show
        ASF subversion and git services added a comment - Commit 1504867 from Shai Erera in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1504867 ] LUCENE-5116 : IW.addIndexes doesn't prune all deleted segments
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. Thanks Rob!

        Do you think I should back-port it to 4.4.x, in case we'll do another RC?

        Show
        Shai Erera added a comment - Committed to trunk and 4x. Thanks Rob! Do you think I should back-port it to 4.4.x, in case we'll do another RC?
        Hide
        Steve Rowe added a comment -

        This issue is more an optimization I guess than a bug – don't do wasted work on segments that will be dropped.

        Do you think I should back-port it to 4.4.x, in case we'll do another RC?

        I already have an RC ready to go that doesn't include this, so given that you don't characterize it as a bug fix, I'd rather not backport and delay further.

        Show
        Steve Rowe added a comment - This issue is more an optimization I guess than a bug – don't do wasted work on segments that will be dropped. Do you think I should back-port it to 4.4.x, in case we'll do another RC? I already have an RC ready to go that doesn't include this, so given that you don't characterize it as a bug fix, I'd rather not backport and delay further.
        Hide
        Robert Muir added a comment -

        +1

        I created it as a "bug" because its the only way with lucene you can get a 0 document segment... but you could say its not really a bug just an inconsistency...

        Show
        Robert Muir added a comment - +1 I created it as a "bug" because its the only way with lucene you can get a 0 document segment... but you could say its not really a bug just an inconsistency...
        Hide
        ASF subversion and git services added a comment -

        Commit 1513487 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1513487 ]

        LUCENE-5116: Simplify test to use MatchNoBits instead own impl

        Show
        ASF subversion and git services added a comment - Commit 1513487 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1513487 ] LUCENE-5116 : Simplify test to use MatchNoBits instead own impl
        Hide
        ASF subversion and git services added a comment -

        Commit 1513488 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1513488 ]

        Merged revision(s) 1513487 from lucene/dev/trunk:
        LUCENE-5116: Simplify test to use MatchNoBits instead own impl

        Show
        ASF subversion and git services added a comment - Commit 1513488 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513488 ] Merged revision(s) 1513487 from lucene/dev/trunk: LUCENE-5116 : Simplify test to use MatchNoBits instead own impl
        Hide
        ASF subversion and git services added a comment -

        Commit 1513955 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1513955 ]

        LUCENE-5173: add checkindex piece of LUCENE-5116, prevent 0 document segments completely

        Show
        ASF subversion and git services added a comment - Commit 1513955 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1513955 ] LUCENE-5173 : add checkindex piece of LUCENE-5116 , prevent 0 document segments completely
        Hide
        ASF subversion and git services added a comment -

        Commit 1513958 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1513958 ]

        LUCENE-5173: add checkindex piece of LUCENE-5116, prevent 0 document segments completely

        Show
        ASF subversion and git services added a comment - Commit 1513958 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513958 ] LUCENE-5173 : add checkindex piece of LUCENE-5116 , prevent 0 document segments completely
        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            Shai Erera
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development