Relative to the re-replication support outlined in
KUDU-1096, we can do better in terms of availability properties. Here is a rough outline of such a design.
- When a voter falls behind the leader's log GC threshold, the leader notifies the Master that the voter is no longer up to date.
- The Master selects a node to act as a replacement. It adds that node as a PRE_VOTER to the config (see
KUDU-869) and when that node is caught up, it is automatically promoted to a VOTER.
- When the Master detects that the node has been promoted, it removes the bad node from the config.
Additional cases to detect and handle:
- If the config is in such a state that it would be impossible to add a node, due to a voter that has fallen behind the log GC threshold being in the required majority, then remotely bootstrap that voter without changing the config. The tablet will continue to be unable to serve writes during this time, but will self-heal without administrator intervention.
This can be further improved by adding support for aborting a config-change operation that cannot commit.
This requires some additional plumbing from the leader to the Master to notify it of slow followers.
- Closer to optimal fault-tolerance properties; "majority lost" less likely to occur so administrator intervention less likely
- Requires support for pre-voter and a smarter master.