Lucene - Core
  1. Lucene - Core
  2. LUCENE-3197

Optimize runs forever if you keep deleting docs at the same time

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Because we "cascade" merges for an optimize... if you also delete documents while the merges are running, then the merge policy will see the resulting single segment as still not optimized (since it has pending deletes) and do a single-segment merge, and will repeat indefinitely (as long as your app keeps deleting docs).

      1. LUCENE-3197.patch
        18 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        One simple way to fix this would be to have IW disregard the MergePolicy if ever it asks to do a single-segment merge of a segment that had already been produced by merging for the current optimize call.

        But... I don't really like this, as it could be some unusual MergePolicy out there sometimes wants to do such merging.

        So I think a better solution, but API breaking to the MergePolicy, which is OK because it's @experimental, is to change the segmentsToOptimize argument; currently it's just a set recording which segments need to be optimized away. I think we should change it to a Map<String,Boolean>, where the Boolean indicates whether this segment had been created by a merge in the current optimize session. Then I'll fix our MPs to not cascade in such a case.

        Show
        Michael McCandless added a comment - One simple way to fix this would be to have IW disregard the MergePolicy if ever it asks to do a single-segment merge of a segment that had already been produced by merging for the current optimize call. But... I don't really like this, as it could be some unusual MergePolicy out there sometimes wants to do such merging. So I think a better solution, but API breaking to the MergePolicy, which is OK because it's @experimental, is to change the segmentsToOptimize argument; currently it's just a set recording which segments need to be optimized away. I think we should change it to a Map<String,Boolean>, where the Boolean indicates whether this segment had been created by a merge in the current optimize session. Then I'll fix our MPs to not cascade in such a case.
        Hide
        Hoss Man added a comment -

        is the possibility of a never ending optimize in this situation (never ending deletes) really something we need to "fix" ?

        i mean ... isn't this what hte user should expect? they've asked for a single segment w/o deletes, and then while we try to give it to them they keep deleting – how is it bad that we optimize doesn't stop until it's completely done ?

        Show
        Hoss Man added a comment - is the possibility of a never ending optimize in this situation (never ending deletes) really something we need to "fix" ? i mean ... isn't this what hte user should expect? they've asked for a single segment w/o deletes, and then while we try to give it to them they keep deleting – how is it bad that we optimize doesn't stop until it's completely done ?
        Hide
        Yonik Seeley added a comment -

        Regardless of if one views this as a bug or not, I think the more useful semantics are to at least "merge all of the current segments into 1 and remove all currently deleted docs" (i.e. I agree with Mike). The alternative is that optimize is dangerous in the presence of index updates (i.e. applications should discontinue updates if they call optimize).

        Show
        Yonik Seeley added a comment - Regardless of if one views this as a bug or not, I think the more useful semantics are to at least "merge all of the current segments into 1 and remove all currently deleted docs" (i.e. I agree with Mike). The alternative is that optimize is dangerous in the presence of index updates (i.e. applications should discontinue updates if they call optimize).
        Hide
        Michael McCandless added a comment -

        Right, this has been the intended semantics of a background optimize for some time, ie, when it returns it only ensures that whatever was not optimized as of when it was called has been merged away.

        This already works correctly for newly added docs, meaning if you continue adding docs / flushing new segments while the optimize runs, it knows that the newly flushed segments do not have to be merged away.

        But for new deletions we are not handling it correctly, which leads to the forever running merges.

        Show
        Michael McCandless added a comment - Right, this has been the intended semantics of a background optimize for some time, ie, when it returns it only ensures that whatever was not optimized as of when it was called has been merged away. This already works correctly for newly added docs, meaning if you continue adding docs / flushing new segments while the optimize runs, it knows that the newly flushed segments do not have to be merged away. But for new deletions we are not handling it correctly, which leads to the forever running merges.
        Hide
        Michael McCandless added a comment -

        Patch.

        Show
        Michael McCandless added a comment - Patch.
        Hide
        Robert Muir added a comment -

        bulk close for 3.3

        Show
        Robert Muir added a comment - bulk close for 3.3

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development