[SOLR-7109] Indexing threads stuck during network partition can put leader into down state - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 4.10.3, 5.0
Fix Version/s: 5.1, 6.0
Component/s: SolrCloud
Labels:
None

Description

I found this recently while running some Jepsen tests. I found that some threads get stuck on zk operations for a long time in ZkController.updateLeaderInitiatedRecoveryState method and when they wake up they go ahead with setting the LIR state to down. But in the mean time, new leader has been elected and sometimes you'd get into a state where the leader itself is put into recovery causing the shard to reject all writes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-7109.patch
13/Mar/15 13:35
17 kB
Shalin Shekhar Mangar
SOLR-7109.patch
02/Mar/15 18:39
14 kB
Shalin Shekhar Mangar

Issue Links

is related to

SOLR-8069 Ensure that only the valid ZooKeeper registered leader can put a replica into Leader Initiated Recovery.

Closed

relates to

SOLR-7245 Temporary ZK election or connection loss should not stall indexing due to LIR

Closed

Activity

People

Assignee:: Shalin Shekhar Mangar

Reporter:: Shalin Shekhar Mangar

Votes:: 1 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 13/Feb/15 17:21

Updated:: 09/May/16 18:58

Resolved:: 15/Mar/15 18:42