[SOLR-6530] Commits under network partition can put any node in down state - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.10.2, 5.0, 6.0
Component/s: SolrCloud
Labels:
None

Description

Commits are executed by any node in SolrCloud i.e. they're not routed via the leader like other updates.

Suppose there's 1 collection, 1 shard, 2 replicas (A and B) and A is the leader
Suppose a commit request is made to node B during a time where B cannot talk to A due to a partition for any reason (failing switch, heavy GC, whatever)
B fails to distribute the commit to A (times out) and asks A to recover
This was okay earlier because a leader just ignores recovery requests but with leader initiated recovery code, B puts A in the "down" state and A can never get out of that state.

tl;dr; During network partitions, if enough commit/optimize requests are sent to the cluster, all the nodes in the cluster will eventually be marked as "down".

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-6530.patch
02/Oct/14 11:08
10 kB
Shalin Shekhar Mangar
SOLR-6530.patch
02/Oct/14 10:23
9 kB
Shalin Shekhar Mangar
SOLR-6530.patch
01/Oct/14 16:27
10 kB
Shalin Shekhar Mangar
SOLR-6530.patch
01/Oct/14 16:22
9 kB
Shalin Shekhar Mangar
SOLR-6530.patch
19/Sep/14 21:51
14 kB
Shalin Shekhar Mangar
SOLR-6530.patch
19/Sep/14 06:12
8 kB
Shalin Shekhar Mangar
SOLR-6530.patch
18/Sep/14 15:54
8 kB
Shalin Shekhar Mangar
SOLR-6530.patch
17/Sep/14 15:29
6 kB
Shalin Shekhar Mangar

Issue Links

is related to

SOLR-6536 Refactor DistributedUpdateProcessor's leader logic

Open

Activity

People

Assignee:: Shalin Shekhar Mangar

Reporter:: Shalin Shekhar Mangar

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 17/Sep/14 15:27

Updated:: 09/May/16 18:58

Resolved:: 16/Oct/14 08:26