Solr
  1. Solr
  2. SOLR-7141

RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      The current wait of 3 seconds is pushing the envelope a bit.

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -
                  // we wait a bit so that any updates on the leader
                  // that started before they saw recovering state 
                  // are sure to have finished
                  try {
                    Thread.sleep(3000);
                  } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                  }
          
          Show
          Mark Miller added a comment - // we wait a bit so that any updates on the leader // that started before they saw recovering state // are sure to have finished try { Thread.sleep(3000); } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
          Hide
          Mark Miller added a comment -

          I guess we should probably go up to like 10 seconds. Longer term, perhaps there is something else we can try that would better deal with a really badly timed long GC or something.

          Show
          Mark Miller added a comment - I guess we should probably go up to like 10 seconds. Longer term, perhaps there is something else we can try that would better deal with a really badly timed long GC or something.
          Hide
          Mark Miller added a comment -

          I'll also make it configurable.

          Show
          Mark Miller added a comment - I'll also make it configurable.
          Hide
          ASF subversion and git services added a comment -

          Commit 1668396 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1668396 ]

          SOLR-7141: RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

          Show
          ASF subversion and git services added a comment - Commit 1668396 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1668396 ] SOLR-7141 : RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.
          Hide
          ASF subversion and git services added a comment -

          Commit 1668884 from Mark Miller in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1668884 ]

          SOLR-7141: RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

          Show
          ASF subversion and git services added a comment - Commit 1668884 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1668884 ] SOLR-7141 : RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.
          Hide
          Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          Timothy Potter added a comment - Bulk close after 5.1 release
          Hide
          Shalin Shekhar Mangar added a comment -

          Hi Mark, can you please explain why this wait is necessary?

          Show
          Shalin Shekhar Mangar added a comment - Hi Mark, can you please explain why this wait is necessary?
          Hide
          Mark Miller added a comment -

          Its as the comment says:

                  // we wait a bit so that any updates on the leader
                  // that started before they saw recovering state 
                  // are sure to have finished
          

          Slow updates based on old state can be wrong. We started this at 2 seconds just kind of out of thin air. I had to raise it because I saw it wasn't enough in some hdfs chaosmonkey test fails and so we need to make sure we have plenty of padding for real world hardware.

          It might be nice to do something more concrete, but it's very tricky to solve nicely.

          Show
          Mark Miller added a comment - Its as the comment says: // we wait a bit so that any updates on the leader // that started before they saw recovering state // are sure to have finished Slow updates based on old state can be wrong. We started this at 2 seconds just kind of out of thin air. I had to raise it because I saw it wasn't enough in some hdfs chaosmonkey test fails and so we need to make sure we have plenty of padding for real world hardware. It might be nice to do something more concrete, but it's very tricky to solve nicely.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Mark. What do you mean by "wrong" in this context? If the leader saw this node's state as down then it wouldn't send any updates our way; if it saw this node's state as recovering or active then it would send the exact same request. What kinds of requests or scenarios are we trying to prevent by waiting?

          Show
          Shalin Shekhar Mangar added a comment - Thanks Mark. What do you mean by "wrong" in this context? If the leader saw this node's state as down then it wouldn't send any updates our way; if it saw this node's state as recovering or active then it would send the exact same request. What kinds of requests or scenarios are we trying to prevent by waiting?
          Hide
          Yonik Seeley added a comment -

          It's tricky

          From memory, here's how it's supposed to work:
          1. replica tells leader it want's to recover
          2. leader starts forwarding updates to replica (which the replica buffers since it's in recovery)
          3. leader executes a hard commit (so replica can replicate the current index)
          4. replica starts replicating index from the last leader commit point

          Note that the ordering of #2 and #3 is very important. If we did #3 first and then #2 after, some updates won't make it into the commit and also won't be forwarded to the replica (and that leads to data loss).

          Now the issue: even though we do #2 first and #3 after... it's possible to have an unfortunately scheduled update in a different thread that started before we did #2, and doesn't complete until after #3, so that update was not forwarded, and it's also not in the replicated index. The sleep (which should be between steps #2 and #3) is to try and give time for this update to complete and make it into the index.

          It occurs to me that the lucene IndexWriter thread stealing (same issue that caused this: SOLR-6820) could make this much more likely than we would have thought.

          One possible alternative is to block updates for a commit of this type (replication commit). Any blocked updates would need to see that they need to be forwarded to the replica too (once they are unblocked) - I don't know if the code is currently written that way.

          Show
          Yonik Seeley added a comment - It's tricky From memory, here's how it's supposed to work: 1. replica tells leader it want's to recover 2. leader starts forwarding updates to replica (which the replica buffers since it's in recovery) 3. leader executes a hard commit (so replica can replicate the current index) 4. replica starts replicating index from the last leader commit point Note that the ordering of #2 and #3 is very important. If we did #3 first and then #2 after, some updates won't make it into the commit and also won't be forwarded to the replica (and that leads to data loss). Now the issue: even though we do #2 first and #3 after... it's possible to have an unfortunately scheduled update in a different thread that started before we did #2, and doesn't complete until after #3, so that update was not forwarded, and it's also not in the replicated index. The sleep (which should be between steps #2 and #3) is to try and give time for this update to complete and make it into the index. It occurs to me that the lucene IndexWriter thread stealing (same issue that caused this: SOLR-6820 ) could make this much more likely than we would have thought. One possible alternative is to block updates for a commit of this type (replication commit). Any blocked updates would need to see that they need to be forwarded to the replica too (once they are unblocked) - I don't know if the code is currently written that way.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks for explaining in detail, Yonik. This is indeed tricky. Shard splitting can suffer from the same problem. I'll open an issue to track this.

          Show
          Shalin Shekhar Mangar added a comment - Thanks for explaining in detail, Yonik. This is indeed tricky. Shard splitting can suffer from the same problem. I'll open an issue to track this.
          Hide
          Shalin Shekhar Mangar added a comment -

          I opened SOLR-7427

          Show
          Shalin Shekhar Mangar added a comment - I opened SOLR-7427

            People

            • Assignee:
              Mark Miller
              Reporter:
              Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development