Solr
  1. Solr
  2. SOLR-3772

On cluster startup, we should wait until we see all registered replicas before running the leader process - or if they all do not come up, N amount of time.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      This seems like a simple and good way to start.

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          Like SOLR-3750 (see recent comment on that issue), I plan to make this off by default for now, with the ability to enable it and configure the wait time.

          Show
          Mark Miller added a comment - Like SOLR-3750 (see recent comment on that issue), I plan to make this off by default for now, with the ability to enable it and configure the wait time.
          Hide
          Mark Miller added a comment -

          After further thought, we don't need to worry about the expiration case, and this one should be on by default. Unless someone convinces me otherwise, I've gone with a default of a 3 minute wait.

          Show
          Mark Miller added a comment - After further thought, we don't need to worry about the expiration case, and this one should be on by default. Unless someone convinces me otherwise, I've gone with a default of a 3 minute wait.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1384937

          SOLR-3833: When a election is started because a leader went down, the new leader candidate should decline if the last state they published was not active.

          SOLR-3836: When doing peer sync, we should only count sync attempts that cannot reach the given host as success when the candidate leader is syncing with the replicas - not when replicas are syncing to the leader.

          SOLR-3835: In our leader election algorithm, if on connection loss we found we did not create our election node, we should retry, not throw an exception.

          SOLR-3834: A new leader on cluster startup should also run the leader sync process in case there was a bad cluster shutdown.

          SOLR-3772: On cluster startup, we should wait until we see all registered replicas before running the leader process - or if they all do not come up, N amount of time.

          SOLR-3756: If we are elected the leader of a shard, but we fail to publish this for any reason, we should clean up and re trigger a leader election.

          SOLR-3812: ConnectionLoss during recovery can cause lost updates, leading to shard inconsistency.

          SOLR-3813: When a new leader syncs, we need to ask all shards to sync back, not just those that are active.

          SOLR-3807: Currently during recovery we pause for a number of seconds after waiting for the leader to see a recovering state so that any previous updates will have finished before our commit on the leader - we don't need this wait for peersync.

          SOLR-3837: When a leader is elected and asks replicas to sync back to him and that fails, we should ask those nodes to recovery asynchronously rather than synchronously.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1384937 SOLR-3833 : When a election is started because a leader went down, the new leader candidate should decline if the last state they published was not active. SOLR-3836 : When doing peer sync, we should only count sync attempts that cannot reach the given host as success when the candidate leader is syncing with the replicas - not when replicas are syncing to the leader. SOLR-3835 : In our leader election algorithm, if on connection loss we found we did not create our election node, we should retry, not throw an exception. SOLR-3834 : A new leader on cluster startup should also run the leader sync process in case there was a bad cluster shutdown. SOLR-3772 : On cluster startup, we should wait until we see all registered replicas before running the leader process - or if they all do not come up, N amount of time. SOLR-3756 : If we are elected the leader of a shard, but we fail to publish this for any reason, we should clean up and re trigger a leader election. SOLR-3812 : ConnectionLoss during recovery can cause lost updates, leading to shard inconsistency. SOLR-3813 : When a new leader syncs, we need to ask all shards to sync back, not just those that are active. SOLR-3807 : Currently during recovery we pause for a number of seconds after waiting for the leader to see a recovering state so that any previous updates will have finished before our commit on the leader - we don't need this wait for peersync. SOLR-3837 : When a leader is elected and asks replicas to sync back to him and that fails, we should ask those nodes to recovery asynchronously rather than synchronously.
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development