Description
Relative to the re-replication support outlined in KUDU-1096, we can do better in terms of availability properties. Here is a rough outline of such a design.
Design:
- When a voter falls behind the leader's log GC threshold, the leader notifies the Master that the voter is no longer up to date.
- The Master selects a node to act as a replacement. It adds that node as a PRE_VOTER to the config (see
KUDU-869) and when that node is caught up, it is automatically promoted to a VOTER. - When the Master detects that the node has been promoted, it removes the bad node from the config.
Additional cases to detect and handle:
- If the config is in such a state that it would be impossible to add a node, due to a voter that has fallen behind the log GC threshold being in the required majority, then remotely bootstrap that voter without changing the config. The tablet will continue to be unable to serve writes during this time, but will self-heal without administrator intervention.
This can be further improved by adding support for aborting a config-change operation that cannot commit.
This requires some additional plumbing from the leader to the Master to notify it of slow followers.
Pros:
- Closer to optimal fault-tolerance properties; "majority lost" less likely to occur so administrator intervention less likely
Cons:
- Requires support for pre-voter and a smarter master.
Attachments
Issue Links
- is blocked by
-
KUDU-869 Support PRE_VOTER config membership type
- Resolved
-
KUDU-1033 Capability to delete & bootstrap followers that fall too far behind log
- Resolved
-
KUDU-1194 consensus: Allow abort of uncommittable config change ops
- Open
- is related to
-
KUDU-1096 Re-replication support for Kudu beta
- Resolved
- relates to
-
KUDU-1449 tablet unavailable caused by follower can not upgrade to leader.
- Resolved