Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Current transfer leadership implementation in Ratis is depending on priority. The current leader will periodically check follower's priority, and yield leader to higher priority peer.
In this Jira, I propose to implement the basic transfer leadership operation, which is described in section 3.10 "leadership transfer extension" of Diego Ongaro's PhD dissertation. In a future Jira, we can change the current "yieldLeaderToHigherPriorityPeer()" to use this operation.
Steps of the transfer leadership operation:
- The prior leader stops accepting new client requests.
- The prior leader fully updates the target server’s log to match its own, using the normal log replication mechanism.
- The prior leader sends a TimeoutNow request to the target server. This request has the same effect as the target server’s election timer firing: the target server starts a new election (incrementing its term and becoming a candidate).
Success condition:
- Once the target server receives the TimeoutNow request, it is highly likely to start an election before any other server and become leader in the next term. Its next message to the prior leader will include its new term number, causing the prior leader to step down. At this point, leadership transfer is complete.
Failure condition:
- It is also possible for the target server to fail; in this case, the cluster must resume client operations. If leadership transfer does not complete after about an election timeout, the prior leader aborts the transfer and resumes accepting client requests. If the prior leader was mistaken and the target server is actually operational, then at worst this mistake will result in an extra election, after which client operations will be restored.
Attachments
Issue Links
- links to