[SOLR-5593] shard leader loss due to ZK session expiry - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.7, 6.0
Component/s: SolrCloud
Labels:
None

Description

The problem we saw was that the shard leader ceased to be shard leader (in our case due to its zookeeper session expiring). The followers thus rejected update requests (DistributedUpdateProcessor setupRequest's call to ZkStateReader getLeaderRetry) and the leader asked them to recover (DistributedUpdateProcessor doFinish). The followers published themselves as recovering (CoreAdminHandler handleRequestRecoveryAction) and the shard leader loss triggered an election in which none of the followers became the leader due to their recovering state (ShardLeaderElectionContext shouldIBeLeader). The former shard leader also did not become shard leader because its new seq number placed it after the existing replicas (LeaderElector checkIfIamLeader seq <= intSeqs.get(0)).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

CoreAdminHandler.patch
31/Dec/13 16:08
2 kB
Christine Poerschke

Issue Links

relates to

SOLR-5727 LBHttpSolrServer should only retry on Connection exceptions when sending updates.

Closed

Activity

People

Assignee:: Mark Miller

Reporter:: Christine Poerschke

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 31/Dec/13 16:04

Updated:: 09/May/16 18:50

Resolved:: 06/Feb/14 02:42