[SOLR-12187] Replica should watch clusterstate and unload itself if its entry is removed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 7.4
Component/s: None
Labels:
None

Description

With the introduction of autoscaling framework, we have seen an increase in the number of issues related to the race condition between delete a replica and other stuff.

Case 1: DeleteReplicaCmd failed to send UNLOAD request to a replica, therefore, forcefully remove its entry from clusterstate, but the replica still function normally and be able to become a leader -> ~~SOLR-12176~~
Case 2:

DeleteReplicaCmd enqueue a DELETECOREOP (without sending a request to replica because the node is not live)
The node start and the replica get loaded
DELETECOREOP has not processed hence the replica still present in clusterstate --> pass checkStateInZk
DELETECOREOP is executed, DeleteReplicaCmd finished
- result 1: the replica start recovering, finish it and publish itself as ACTIVE --> state of the replica is ACTIVE
- result 2: the replica throw an exception (probably: NPE)
  --> state of the replica is DOWN, not join leader election

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-12187.patch
17/Apr/18 07:40
32 kB
Cao Manh Dat
SOLR-12187.patch
12/Apr/18 08:09
31 kB
Cao Manh Dat
SOLR-12187.patch
11/Apr/18 02:09
29 kB
Cao Manh Dat
SOLR-12187.patch
10/Apr/18 12:23
27 kB
Cao Manh Dat
SOLR-12187.patch
10/Apr/18 09:55
26 kB
Cao Manh Dat
SOLR-12187.patch
05/Apr/18 22:57
11 kB
Cao Manh Dat

Activity

People

Assignee:: Cao Manh Dat

Reporter:: Cao Manh Dat

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 04/Apr/18 03:14

Updated:: 02/Oct/19 17:30

Resolved:: 18/Apr/18 13:07