Solr
  1. Solr
  2. SOLR-8372

Canceled recovery can lead to data loss

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      A recovery via index replication tells the update log to start buffering updates. If that recovery is canceled for whatever reason by the replica, the RecoveryStrategy calls ulog.dropBufferedUpdates() which stops buffering and places the UpdateLog back in active mode. If updates come from the leader after this point (and before ReplicationStrategy retries recovery), the update will be processed as normal and added to the transaction log. If the server is bounced, those last updates to the transaction log look normal (no FLAG_GAP) and can be used to determine who is more up to date.

      1. SOLR-8372.patch
        7 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment - - edited

        I've been thinking about possible ways to deal with this:

        • stop updates at a higher level... if the distributed update processor knows it's not in the right state to accept updates, then reject them
          • this has problems with race conditions unless the check/reject is with the bucket lock held
        • keep buffering updates when recovery is canceled. When another call to bufferUpdates() is made, reset the starting position so we know where replay needs to start from.
        • introduce a new state into UpdateLog (the current states are REPLAYING, BUFFERING, APPLYING_BUFFERED, ACTIVE)
          • this new state would do what? Silently drop updates it receives? Throw an exception? The latter would seem to complicate things further if it could possibly cause another node to put us into LIR again.

        In anticipation of really hairy scenarios, keeping updates might be useful rather than dropping them. So perhaps the "keep buffering" option may be simplest as it also avoids introducing another state? We should normally receive only a a few more updates that were in the pipeline when something happened to our recovery attempt anyway (like the leader dying)?

        Show
        Yonik Seeley added a comment - - edited I've been thinking about possible ways to deal with this: stop updates at a higher level... if the distributed update processor knows it's not in the right state to accept updates, then reject them this has problems with race conditions unless the check/reject is with the bucket lock held keep buffering updates when recovery is canceled. When another call to bufferUpdates() is made, reset the starting position so we know where replay needs to start from. introduce a new state into UpdateLog (the current states are REPLAYING, BUFFERING, APPLYING_BUFFERED, ACTIVE) this new state would do what? Silently drop updates it receives? Throw an exception? The latter would seem to complicate things further if it could possibly cause another node to put us into LIR again. In anticipation of really hairy scenarios, keeping updates might be useful rather than dropping them. So perhaps the "keep buffering" option may be simplest as it also avoids introducing another state? We should normally receive only a a few more updates that were in the pipeline when something happened to our recovery attempt anyway (like the leader dying)?
        Hide
        Mark Miller added a comment -

        Buffering seems fine to me. Now that we start buffering even before peer sync anyway, we are almost always in a buffering state anyway, other than the brief time between replaying buffered docs and starting the recovery again.

        Show
        Mark Miller added a comment - Buffering seems fine to me. Now that we start buffering even before peer sync anyway, we are almost always in a buffering state anyway, other than the brief time between replaying buffered docs and starting the recovery again.
        Hide
        Shalin Shekhar Mangar added a comment -

        Continuing to buffer updates seems safe but why to reset the starting updates on another call to bufferUpdates()? Wouldn't the existing RecoveryInfo have the right start position already?

        Show
        Shalin Shekhar Mangar added a comment - Continuing to buffer updates seems safe but why to reset the starting updates on another call to bufferUpdates()? Wouldn't the existing RecoveryInfo have the right start position already?
        Hide
        Yonik Seeley added a comment -

        Buffering is done during recovery-via-index-replication... the leader starts forwarding updates and then does a commit we can replicate from. If that whole process is canceled and restarted, we only need to replay from the last time bufferUpdates() was called (all of the previously buffered updates should be in the new index snapshot we are going to get).

        Are there other scenarios where this wouldn't be the right thing to do? I guess we need an audit of every place that bufferUpdates() is called.

        Show
        Yonik Seeley added a comment - Buffering is done during recovery-via-index-replication... the leader starts forwarding updates and then does a commit we can replicate from. If that whole process is canceled and restarted, we only need to replay from the last time bufferUpdates() was called (all of the previously buffered updates should be in the new index snapshot we are going to get). Are there other scenarios where this wouldn't be the right thing to do? I guess we need an audit of every place that bufferUpdates() is called.
        Hide
        Mark Miller added a comment -

        Shard splitting does use it.

        There is also a general Solr admin API that start buffering. Looks like migrate uses that.

        Show
        Mark Miller added a comment - Shard splitting does use it. There is also a general Solr admin API that start buffering. Looks like migrate uses that.
        Hide
        Mark Miller added a comment - - edited

        Looks like for shard splitting it is just entering buffering earlier, to be sure it happens before the core can receive updates.

        I'm not so sure what migrate is doing on a first glance.

        For both of them, I'm curious where the replay or drop happens. Seems a little hairy to rely on the RecoveryStrategy replay or drop. (Though I guess it seems a whole lot safer now that we always buffer docs in recovery, even if peer sync works.)

        Show
        Mark Miller added a comment - - edited Looks like for shard splitting it is just entering buffering earlier, to be sure it happens before the core can receive updates. I'm not so sure what migrate is doing on a first glance. For both of them, I'm curious where the replay or drop happens. Seems a little hairy to rely on the RecoveryStrategy replay or drop. (Though I guess it seems a whole lot safer now that we always buffer docs in recovery, even if peer sync works.)
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks for explaining that Yonik. Yes, resetting start position does seem like the right thing to do.

        Shard splitting does use it.

        Yes, sub-shard leader set update log to buffering mode during core creation. The sub-shard leader has no one to recover from so recovery is a no-op so it shouldn't be impacted by this change.

        There is also a general Solr admin API that start buffering. Looks like migrate uses that.

        Yes, that is used by migrate to enable buffering on the target collection's leader. After this call succeeds, a routing rule is added to the source collection which starts sending all relevant updates to the target collection.

        For both of them, I'm curious where the replay or drop happens. Seems a little hairy to rely on the RecoveryStrategy replay or drop. (Though I guess it seems a whole lot safer now that we always buffer docs in recovery, even if peer sync works.)

        Both use the REQUESTAPPLYUPDATES core admin action to apply buffered updates. RecoveryStrategy is not used here at all. If a migrate action fails, the target collection is still left in buffering state which shouldn't pose a problem if migrate is retried.

        Show
        Shalin Shekhar Mangar added a comment - Thanks for explaining that Yonik. Yes, resetting start position does seem like the right thing to do. Shard splitting does use it. Yes, sub-shard leader set update log to buffering mode during core creation. The sub-shard leader has no one to recover from so recovery is a no-op so it shouldn't be impacted by this change. There is also a general Solr admin API that start buffering. Looks like migrate uses that. Yes, that is used by migrate to enable buffering on the target collection's leader. After this call succeeds, a routing rule is added to the source collection which starts sending all relevant updates to the target collection. For both of them, I'm curious where the replay or drop happens. Seems a little hairy to rely on the RecoveryStrategy replay or drop. (Though I guess it seems a whole lot safer now that we always buffer docs in recovery, even if peer sync works.) Both use the REQUESTAPPLYUPDATES core admin action to apply buffered updates. RecoveryStrategy is not used here at all. If a migrate action fails, the target collection is still left in buffering state which shouldn't pose a problem if migrate is retried.
        Hide
        Mark Miller added a comment -

        Both use the REQUESTAPPLYUPDATES

        Ah, was looking for a cmd like that but did not see it.

        Are there other scenarios where this wouldn't be the right thing to do?

        Doesn't seem so then?

        Only thing I can think of is, what about the async nature of those core admin buffer calls - any chance of those overlapping with recovery strategy buffer calls? And if so, would that matter with this change?

        Show
        Mark Miller added a comment - Both use the REQUESTAPPLYUPDATES Ah, was looking for a cmd like that but did not see it. Are there other scenarios where this wouldn't be the right thing to do? Doesn't seem so then? Only thing I can think of is, what about the async nature of those core admin buffer calls - any chance of those overlapping with recovery strategy buffer calls? And if so, would that matter with this change?
        Hide
        Yonik Seeley added a comment -

        Here's a patch that allows bufferUpdates() to be called more than once, and removes the call to dropBufferedUpdates() from RecoveryStrategy.

        Previously, if bufferUpdates() was called in a state!=ACTIVE, we simply returned w/o changing the state. This is now logged at least.

        This has an additional side effect of having buffered versions in our log that were never applied to the index. This seems OK though... better not to lose updates in general.

        Show
        Yonik Seeley added a comment - Here's a patch that allows bufferUpdates() to be called more than once, and removes the call to dropBufferedUpdates() from RecoveryStrategy. Previously, if bufferUpdates() was called in a state!=ACTIVE, we simply returned w/o changing the state. This is now logged at least. This has an additional side effect of having buffered versions in our log that were never applied to the index. This seems OK though... better not to lose updates in general.
        Hide
        ASF subversion and git services added a comment -

        Commit 1720250 from Yonik Seeley in branch 'dev/trunk'
        [ https://svn.apache.org/r1720250 ]

        SOLR-8372: continue buffering if recovery is canceled/failed

        Show
        ASF subversion and git services added a comment - Commit 1720250 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1720250 ] SOLR-8372 : continue buffering if recovery is canceled/failed
        Hide
        ASF subversion and git services added a comment -

        Commit 1720251 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1720251 ]

        SOLR-8372: continue buffering if recovery is canceled/failed

        Show
        ASF subversion and git services added a comment - Commit 1720251 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1720251 ] SOLR-8372 : continue buffering if recovery is canceled/failed
        Hide
        Mike Drob added a comment -

        Yonik - it looks like logReplayFinish is never used; is that an oversight?

        Also, logReplay might need to only release one permit before applying buffered updates, so that you can release again before the second apply?

        Show
        Mike Drob added a comment - Yonik - it looks like logReplayFinish is never used; is that an oversight? Also, logReplay might need to only release one permit before applying buffered updates, so that you can release again before the second apply?

          People

          • Assignee:
            Unassigned
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development