Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3197

Optimize runs forever if you keep deleting docs at the same time

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 3.3, 4.0-ALPHA
    • core/index
    • None
    • New

    Description

      Because we "cascade" merges for an optimize... if you also delete documents while the merges are running, then the merge policy will see the resulting single segment as still not optimized (since it has pending deletes) and do a single-segment merge, and will repeat indefinitely (as long as your app keeps deleting docs).

      Attachments

        1. LUCENE-3197.patch
          18 kB
          Michael McCandless

        Activity

          One simple way to fix this would be to have IW disregard the MergePolicy if ever it asks to do a single-segment merge of a segment that had already been produced by merging for the current optimize call.

          But... I don't really like this, as it could be some unusual MergePolicy out there sometimes wants to do such merging.

          So I think a better solution, but API breaking to the MergePolicy, which is OK because it's @experimental, is to change the segmentsToOptimize argument; currently it's just a set recording which segments need to be optimized away. I think we should change it to a Map<String,Boolean>, where the Boolean indicates whether this segment had been created by a merge in the current optimize session. Then I'll fix our MPs to not cascade in such a case.

          mikemccand Michael McCandless added a comment - One simple way to fix this would be to have IW disregard the MergePolicy if ever it asks to do a single-segment merge of a segment that had already been produced by merging for the current optimize call. But... I don't really like this, as it could be some unusual MergePolicy out there sometimes wants to do such merging. So I think a better solution, but API breaking to the MergePolicy, which is OK because it's @experimental, is to change the segmentsToOptimize argument; currently it's just a set recording which segments need to be optimized away. I think we should change it to a Map<String,Boolean>, where the Boolean indicates whether this segment had been created by a merge in the current optimize session. Then I'll fix our MPs to not cascade in such a case.

          is the possibility of a never ending optimize in this situation (never ending deletes) really something we need to "fix" ?

          i mean ... isn't this what hte user should expect? they've asked for a single segment w/o deletes, and then while we try to give it to them they keep deleting – how is it bad that we optimize doesn't stop until it's completely done ?

          hossman Chris M. Hostetter added a comment - is the possibility of a never ending optimize in this situation (never ending deletes) really something we need to "fix" ? i mean ... isn't this what hte user should expect? they've asked for a single segment w/o deletes, and then while we try to give it to them they keep deleting – how is it bad that we optimize doesn't stop until it's completely done ?
          yseeley@gmail.com Yonik Seeley added a comment -

          Regardless of if one views this as a bug or not, I think the more useful semantics are to at least "merge all of the current segments into 1 and remove all currently deleted docs" (i.e. I agree with Mike). The alternative is that optimize is dangerous in the presence of index updates (i.e. applications should discontinue updates if they call optimize).

          yseeley@gmail.com Yonik Seeley added a comment - Regardless of if one views this as a bug or not, I think the more useful semantics are to at least "merge all of the current segments into 1 and remove all currently deleted docs" (i.e. I agree with Mike). The alternative is that optimize is dangerous in the presence of index updates (i.e. applications should discontinue updates if they call optimize).

          Right, this has been the intended semantics of a background optimize for some time, ie, when it returns it only ensures that whatever was not optimized as of when it was called has been merged away.

          This already works correctly for newly added docs, meaning if you continue adding docs / flushing new segments while the optimize runs, it knows that the newly flushed segments do not have to be merged away.

          But for new deletions we are not handling it correctly, which leads to the forever running merges.

          mikemccand Michael McCandless added a comment - Right, this has been the intended semantics of a background optimize for some time, ie, when it returns it only ensures that whatever was not optimized as of when it was called has been merged away. This already works correctly for newly added docs, meaning if you continue adding docs / flushing new segments while the optimize runs, it knows that the newly flushed segments do not have to be merged away. But for new deletions we are not handling it correctly, which leads to the forever running merges.

          Patch.

          mikemccand Michael McCandless added a comment - Patch.
          rcmuir Robert Muir added a comment -

          bulk close for 3.3

          rcmuir Robert Muir added a comment - bulk close for 3.3
          tomoko Tomoko Uchida added a comment -

          This issue was moved to GitHub issue: #4270.

          tomoko Tomoko Uchida added a comment - This issue was moved to GitHub issue: #4270 .

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: