[branch_4x commit] Mark Robert Miller
SOLR-3833: When a election is started because a leader went down, the new leader candidate should decline if the last state they published was not active.
SOLR-3836: When doing peer sync, we should only count sync attempts that cannot reach the given host as success when the candidate leader is syncing with the replicas - not when replicas are syncing to the leader.
SOLR-3835: In our leader election algorithm, if on connection loss we found we did not create our election node, we should retry, not throw an exception.
SOLR-3834: A new leader on cluster startup should also run the leader sync process in case there was a bad cluster shutdown.
SOLR-3772: On cluster startup, we should wait until we see all registered replicas before running the leader process - or if they all do not come up, N amount of time.
SOLR-3756: If we are elected the leader of a shard, but we fail to publish this for any reason, we should clean up and re trigger a leader election.
SOLR-3812: ConnectionLoss during recovery can cause lost updates, leading to shard inconsistency.
SOLR-3813: When a new leader syncs, we need to ask all shards to sync back, not just those that are active.
SOLR-3807: Currently during recovery we pause for a number of seconds after waiting for the leader to see a recovering state so that any previous updates will have finished before our commit on the leader - we don't need this wait for peersync.
SOLR-3837: When a leader is elected and asks replicas to sync back to him and that fails, we should ask those nodes to recovery asynchronously rather than synchronously.