Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff).

      I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a "regression" here. (see solr users list:Split Shard Error - maxValue must be non-negative).

      1. LUCENE-5173_ugly.patch
        9 kB
        Robert Muir
      2. LUCENE-5173.patch
        10 kB
        Robert Muir
      3. LUCENE-5173.patch
        1 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Simple patch: also adds an assert to SegmentMerger.

        we can only check this if the index is 4.5+, because thats when LUCENE-5116 was fixed.

        Show
        Robert Muir added a comment - Simple patch: also adds an assert to SegmentMerger. we can only check this if the index is 4.5+, because thats when LUCENE-5116 was fixed.
        Hide
        Robert Muir added a comment -

        just a slight simplification of the logic

        Show
        Robert Muir added a comment - just a slight simplification of the logic
        Hide
        Robert Muir added a comment -

        deleted the wrongly-named patch, sorry

        Show
        Robert Muir added a comment - deleted the wrongly-named patch, sorry
        Hide
        Uwe Schindler added a comment -

        Patch is fine. I like that the checkindex allows older segments with empty size, but once a segment was merged it can no longer be empty.

        Maybe the assert in SegmentMerge should be a hard check, unless SegmentMerger always strictly throws away empty segments (not that somebody can somehow with a crazy alcoholic mergepolicy create those segments again).

        Show
        Uwe Schindler added a comment - Patch is fine. I like that the checkindex allows older segments with empty size, but once a segment was merged it can no longer be empty. Maybe the assert in SegmentMerge should be a hard check, unless SegmentMerger always strictly throws away empty segments (not that somebody can somehow with a crazy alcoholic mergepolicy create those segments again).
        Hide
        Robert Muir added a comment -

        Maybe the assert in SegmentMerge should be a hard check, unless SegmentMerger always strictly throws away empty segments (not that somebody can somehow with a crazy alcoholic mergepolicy create those segments again).

        Or, maybe mergeState.segmentInfo.setDocCount(setDocMaps()) should happen in the ctor of SegmentMerger instead of line 1 of merge()?

        And it could a simple boolean method like shouldMerge(): returns docCount > 0, called by addIndexes and mergeMiddle?

        this way the logic added to addIndexes in LUCENE-5116 wouldnt even need to be there, and we'd feel better that we arent writing such 0 document segments (which codecs are not prepared to handle today).

        Show
        Robert Muir added a comment - Maybe the assert in SegmentMerge should be a hard check, unless SegmentMerger always strictly throws away empty segments (not that somebody can somehow with a crazy alcoholic mergepolicy create those segments again). Or, maybe mergeState.segmentInfo.setDocCount(setDocMaps()) should happen in the ctor of SegmentMerger instead of line 1 of merge()? And it could a simple boolean method like shouldMerge(): returns docCount > 0, called by addIndexes and mergeMiddle? this way the logic added to addIndexes in LUCENE-5116 wouldnt even need to be there, and we'd feel better that we arent writing such 0 document segments (which codecs are not prepared to handle today).
        Hide
        Michael McCandless added a comment -

        +1, I like consolidating the logic into a single shouldMerge().

        And I don't think codecs should be required to handle the 0 doc segment case: we should never send such a segment to them.

        Show
        Michael McCandless added a comment - +1, I like consolidating the logic into a single shouldMerge(). And I don't think codecs should be required to handle the 0 doc segment case: we should never send such a segment to them.
        Hide
        Uwe Schindler added a comment -

        I agree with both. My complaint was the following:
        The assert was not correct, as asserts should only be used for real assertions withing the same class. For this special check, there is something outside of SegmentMerger that could maybe insert empty readers into the merge queue, so those should be thrown away while merging or when sergmentmerger initializes (so move to ctor is a good idea). I am thinking about crazy stuff like a merge policy that wraps with a FilterAtomicReader to filter while merging (like IndexSorter) - which is possible with the current API.

        So the segments should be removed on creating the SegmentMerger when all readers to merge are already in the List<AtomicReader>.

        In the IndexWriter#addIndexes we may then just need the top-level check to not even start a merge.

        Show
        Uwe Schindler added a comment - I agree with both. My complaint was the following: The assert was not correct, as asserts should only be used for real assertions withing the same class. For this special check, there is something outside of SegmentMerger that could maybe insert empty readers into the merge queue, so those should be thrown away while merging or when sergmentmerger initializes (so move to ctor is a good idea). I am thinking about crazy stuff like a merge policy that wraps with a FilterAtomicReader to filter while merging (like IndexSorter) - which is possible with the current API. So the segments should be removed on creating the SegmentMerger when all readers to merge are already in the List<AtomicReader>. In the IndexWriter#addIndexes we may then just need the top-level check to not even start a merge.
        Hide
        Robert Muir added a comment -

        here is a ugly patch, there must be a better way... sorry

        I wonder if its too paranoid: however playing with the old patch I think i hit my own assert with testThreadInterruptDeadLock...

        I will investigate that more, to see under what conditions we are doing these 0 doc merges today.

        Show
        Robert Muir added a comment - here is a ugly patch, there must be a better way... sorry I wonder if its too paranoid: however playing with the old patch I think i hit my own assert with testThreadInterruptDeadLock... I will investigate that more, to see under what conditions we are doing these 0 doc merges today.
        Hide
        Robert Muir added a comment -

        Here's a cleaned up version... maybe its OK.

        As far as the stuff i saw with the first patch on this issue, maybe it was due to running tests from eclipse (I beasted TestIndexWriter with it from curiousity, but nothing came out)... its old news anyway I guess.

        Show
        Robert Muir added a comment - Here's a cleaned up version... maybe its OK. As far as the stuff i saw with the first patch on this issue, maybe it was due to running tests from eclipse (I beasted TestIndexWriter with it from curiousity, but nothing came out)... its old news anyway I guess.
        Hide
        Uwe Schindler added a comment -

        Hi Robert,
        like this much! I have no tested it, but I trust you. I think this is much better than risking to create empty segments because maybe some new functionality/merge policy/... hides documents or adds empty segments to the List of merges.

        Show
        Uwe Schindler added a comment - Hi Robert, like this much! I have no tested it, but I trust you. I think this is much better than risking to create empty segments because maybe some new functionality/merge policy/... hides documents or adds empty segments to the List of merges.
        Hide
        Michael McCandless added a comment -

        +1, patch looks good.

        Show
        Michael McCandless added a comment - +1, patch looks good.
        Hide
        ASF subversion and git services added a comment -

        Commit 1513955 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1513955 ]

        LUCENE-5173: add checkindex piece of LUCENE-5116, prevent 0 document segments completely

        Show
        ASF subversion and git services added a comment - Commit 1513955 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1513955 ] LUCENE-5173 : add checkindex piece of LUCENE-5116 , prevent 0 document segments completely
        Hide
        ASF subversion and git services added a comment -

        Commit 1513958 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1513958 ]

        LUCENE-5173: add checkindex piece of LUCENE-5116, prevent 0 document segments completely

        Show
        ASF subversion and git services added a comment - Commit 1513958 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513958 ] LUCENE-5173 : add checkindex piece of LUCENE-5116 , prevent 0 document segments completely
        Hide
        Robert Muir added a comment -

        Thanks guys

        Show
        Robert Muir added a comment - Thanks guys
        Hide
        Uwe Schindler added a comment -

        Great work!

        Show
        Uwe Schindler added a comment - Great work!
        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development