[SOLR-9446] Leader failure after creating a freshly replicated index can send nodes into recovery even if index was not changed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.3, 7.0
Component/s: replication (java)
Labels:
None

Description

We noticed this issue while migrating solr index from machines A1, A2 and A3 to B1, B2, B3. We followed following steps (and there were no updates during the migration process).

Index had replicas on machines A1, A2, A3. Let's say A1 was the leader at the time
We added 3 more replicas B1, B2 and B3. These nodes synced with the by replication. These fresh nodes do not have tlogs.
We shut down one of the old nodes (A3).
We then shut down the leader (A1)
New leader got elected (let's say A2) became the new leader
Leader asked all the replicas to sync with it
Fresh nodes (ones without tlogs), first tried PeerSync but since there was no frame of reference, PeerSync failed and fresh nodes fail back on to try replication

Although replication would not copy all the segments again, it seems like we can short circuit sync to put nodes back in active state as soon as possible.

If in case freshly replicated index becomes leader for some reason, it can still send nodes (both other freshly replicated indexes and old replicas) into recovery. Here is the scenario

Freshly replicated becomes the leader.
New leader however asks all the replicas to sync with it.
Replicas (including old one) ask for versions from the leader, but the leader has no update logs, hence replicas can not compute missing versions and falls back to replication

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-9446.patch
21/Sep/16 17:44
14 kB
Pushkar Raste

Issue Links

links to

GitHub Pull Request #73

Activity

People

Assignee:: Noble Paul

Reporter:: Pushkar Raste

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 26/Aug/16 15:29

Updated:: 08/Jun/19 15:37

Resolved:: 21/Sep/16 18:27

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m