Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5116

IW.addIndexes doesn't prune all deleted segments

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      at the least, this can easily create segments with maxDoc == 0.

      It seems buggy: elsewhere we prune these segments out, so its expected to have a commit point with no segments rather than a segment with 0 documents...

      1. LUCENE-5116_test.patch
        0.8 kB
        Robert Muir
      2. LUCENE-5116.patch
        4 kB
        Shai Erera

        Activity

        Hide
        shaie Shai Erera added a comment -

        +1. This should be an easy and nice improvement. There's no need to spend the work adding those segments just so they are dropped on the next merge.

        Show
        shaie Shai Erera added a comment - +1. This should be an easy and nice improvement. There's no need to spend the work adding those segments just so they are dropped on the next merge.
        Hide
        mikemccand Michael McCandless added a comment -

        +1

        Show
        mikemccand Michael McCandless added a comment - +1
        Hide
        shaie Shai Erera added a comment -

        I fixed IW.addIndexes to drop empty segments and Rob's test passes. However I fail to reproduce the case where you add a segment with all documents deleted. No matter what I do – call IW.deleteDocuments(new MADQ()) or IW.deleteDocuments(new TermQuery()) – basically a query that matches all the documents in a segment, the segment is dropped on the next commit. Is it even possible to add a segment with all documents deleted?

        The fix is general (compares segment.numDocs > 0), so it will capture those cases too ... only I currently fail to reproduce one. Any ideas how I can repro it?

        Show
        shaie Shai Erera added a comment - I fixed IW.addIndexes to drop empty segments and Rob's test passes. However I fail to reproduce the case where you add a segment with all documents deleted. No matter what I do – call IW.deleteDocuments(new MADQ()) or IW.deleteDocuments(new TermQuery()) – basically a query that matches all the documents in a segment, the segment is dropped on the next commit. Is it even possible to add a segment with all documents deleted? The fix is general (compares segment.numDocs > 0), so it will capture those cases too ... only I currently fail to reproduce one. Any ideas how I can repro it?
        Hide
        rcmuir Robert Muir added a comment -

        so maybe only the empty segment special case exists? If you have an all-deleted one, it should have already been pruned... I think

        But we should be able to fake one with a FilterReader if we really want...

        Show
        rcmuir Robert Muir added a comment - so maybe only the empty segment special case exists? If you have an all-deleted one, it should have already been pruned... I think But we should be able to fake one with a FilterReader if we really want...
        Hide
        shaie Shai Erera added a comment -

        Patch adds tests for empty index and all deleted (fake) segment (thanks Rob!) and fixes IW.addIndexes.

        Rob and I chatted about adding a check to CheckIndex, asserting there are no empty segments. But I'm not sure we should, because technically it's not an error. I.e. you could do new IW().commit() and it's a perfectly valid index.

        This issue is more an optimization I guess than a bug – don't do wasted work on segments that will be dropped.

        Show
        shaie Shai Erera added a comment - Patch adds tests for empty index and all deleted (fake) segment (thanks Rob!) and fixes IW.addIndexes. Rob and I chatted about adding a check to CheckIndex, asserting there are no empty segments. But I'm not sure we should, because technically it's not an error. I.e. you could do new IW().commit() and it's a perfectly valid index. This issue is more an optimization I guess than a bug – don't do wasted work on segments that will be dropped.
        Hide
        rcmuir Robert Muir added a comment -

        asserting there are no empty segments. But I'm not sure we should, because technically it's not an error. I.e. you could do new IW().commit() and it's a perfectly valid index.

        Wait: I still think we need to. Ill open an issue and take care of it.

        The difference is: new IW().commit() should make a "empty commit" (segments_N with no segments). Not a 0 document segment...

        Show
        rcmuir Robert Muir added a comment - asserting there are no empty segments. But I'm not sure we should, because technically it's not an error. I.e. you could do new IW().commit() and it's a perfectly valid index. Wait: I still think we need to. Ill open an issue and take care of it. The difference is: new IW().commit() should make a "empty commit" (segments_N with no segments). Not a 0 document segment...
        Hide
        shaie Shai Erera added a comment -

        Oh, I guess you're right. If you're going to resolve this in a new issue, I think this one is ready to go. I'll add a CHANGES entry.

        Show
        shaie Shai Erera added a comment - Oh, I guess you're right. If you're going to resolve this in a new issue, I think this one is ready to go. I'll add a CHANGES entry.
        Hide
        mikemccand Michael McCandless added a comment -

        +1

        Show
        mikemccand Michael McCandless added a comment - +1
        Hide
        rcmuir Robert Muir added a comment -

        +1 to this patch Shai... thanks!

        Show
        rcmuir Robert Muir added a comment - +1 to this patch Shai... thanks!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1504860 from Shai Erera in branch 'dev/trunk'
        [ https://svn.apache.org/r1504860 ]

        LUCENE-5116: IW.addIndexes doesn't prune all deleted segments

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1504860 from Shai Erera in branch 'dev/trunk' [ https://svn.apache.org/r1504860 ] LUCENE-5116 : IW.addIndexes doesn't prune all deleted segments
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1504867 from Shai Erera in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1504867 ]

        LUCENE-5116: IW.addIndexes doesn't prune all deleted segments

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1504867 from Shai Erera in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1504867 ] LUCENE-5116 : IW.addIndexes doesn't prune all deleted segments
        Hide
        shaie Shai Erera added a comment -

        Committed to trunk and 4x. Thanks Rob!

        Do you think I should back-port it to 4.4.x, in case we'll do another RC?

        Show
        shaie Shai Erera added a comment - Committed to trunk and 4x. Thanks Rob! Do you think I should back-port it to 4.4.x, in case we'll do another RC?
        Hide
        steve_rowe Steve Rowe added a comment -

        This issue is more an optimization I guess than a bug – don't do wasted work on segments that will be dropped.

        Do you think I should back-port it to 4.4.x, in case we'll do another RC?

        I already have an RC ready to go that doesn't include this, so given that you don't characterize it as a bug fix, I'd rather not backport and delay further.

        Show
        steve_rowe Steve Rowe added a comment - This issue is more an optimization I guess than a bug – don't do wasted work on segments that will be dropped. Do you think I should back-port it to 4.4.x, in case we'll do another RC? I already have an RC ready to go that doesn't include this, so given that you don't characterize it as a bug fix, I'd rather not backport and delay further.
        Hide
        rcmuir Robert Muir added a comment -

        +1

        I created it as a "bug" because its the only way with lucene you can get a 0 document segment... but you could say its not really a bug just an inconsistency...

        Show
        rcmuir Robert Muir added a comment - +1 I created it as a "bug" because its the only way with lucene you can get a 0 document segment... but you could say its not really a bug just an inconsistency...
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1513487 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1513487 ]

        LUCENE-5116: Simplify test to use MatchNoBits instead own impl

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1513487 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1513487 ] LUCENE-5116 : Simplify test to use MatchNoBits instead own impl
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1513488 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1513488 ]

        Merged revision(s) 1513487 from lucene/dev/trunk:
        LUCENE-5116: Simplify test to use MatchNoBits instead own impl

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1513488 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513488 ] Merged revision(s) 1513487 from lucene/dev/trunk: LUCENE-5116 : Simplify test to use MatchNoBits instead own impl
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1513955 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1513955 ]

        LUCENE-5173: add checkindex piece of LUCENE-5116, prevent 0 document segments completely

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1513955 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1513955 ] LUCENE-5173 : add checkindex piece of LUCENE-5116 , prevent 0 document segments completely
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1513958 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1513958 ]

        LUCENE-5173: add checkindex piece of LUCENE-5116, prevent 0 document segments completely

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1513958 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513958 ] LUCENE-5173 : add checkindex piece of LUCENE-5116 , prevent 0 document segments completely
        Hide
        jpountz Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        jpountz Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            shaie Shai Erera
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development