Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7427

Recovery can miss some updates when they're neither forwarded nor present in replicated index

    XMLWordPrintableJSON

    Details

      Description

      According to discussion in SOLR-7141. See Yonik Seeley's comment at https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501622#comment-14501622

      From memory, here's how it's supposed to work:
      1. replica tells leader it want's to recover
      2. leader starts forwarding updates to replica (which the replica buffers since it's in recovery)
      3. leader executes a hard commit (so replica can replicate the current index)
      4. replica starts replicating index from the last leader commit point

      Note that the ordering of #2 and #3 is very important. If we did #3 first and then #2 after, some updates won't make it into the commit and also won't be forwarded to the replica (and that leads to data loss).

      Now the issue: even though we do #2 first and #3 after... it's possible to have an unfortunately scheduled update in a different thread that started before we did #2, and doesn't complete until after #3, so that update was not forwarded, and it's also not in the replicated index. The sleep (which should be between steps #2 and #3) is to try and give time for this update to complete and make it into the index.

      It occurs to me that the lucene IndexWriter thread stealing (same issue that caused this: SOLR-6820) could make this much more likely than we would have thought.

      One possible alternative is to block updates for a commit of this type (replication commit). Any blocked updates would need to see that they need to be forwarded to the replica too (once they are unblocked) - I don't know if the code is currently written that way.

      So there is some protection against such a situation but it is based on two timeout values:

      1. The replica stalls recovery until the leader acknowledges that it has indeed seen the replica in 'recovery' (via the prep recovery core admin API)
      2. The replica sleeps for 7 seconds by default (configured via the hidden-switch "solr.cloud.wait-for-updates-with-stale-state-pause" system property) after prep recovery completes to give additional time for such updates to complete.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                shalin Shalin Shekhar Mangar
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: