Details
-
Sub-task
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Feature Complete
-
None
-
None
Description
We want to add initial support for re-replication for the beta release.
Design:
- When a leader detects that a follower has fallen behind to the point that it can't catch up, it will trigger a "remove server" config change.
- When the master gets a report from a tablet and sees that the number of replicas in the config is less than the table's desired replication, it will itself start a task to create a new replica.
Details:
- Let's start with choosing randomly among any tservers that have a most recent heartbeat in the last 3 heartbeat periods, as a reasonable proxy for "live tservers". Later we can do something smarter like "power of two choices" or load-aware placement. Random placement isn't optimal, but also has the least risk of causing weird emergent behavior.
- The master task will call AddServer() to add the newly selected replica.
Additional possible refinements:
- We should also trigger this same process if the leader detects that it hasn't had a successful request send to a follower after N heartbeat periods.
- We should build in some safety net here in the case that the follower is actually still in the middle of bootstrapping and making progress - otherwise we could flap.
- We probably want to prohibit the leader from doing this unless it knows it's still within its "lease period". Otherwise, we might too easily drop to a 1-node config if we get to a 2-node config and the leader itself has some issue.
Pros:
- Fairly simple and easy approach to re-replication.
Cons:
- Availability is less than optimal, for example if a follower is slow enough to fall behind the log, causing the leader to remove it from the Raft config, and there is a simultaneous leader failure (i.e. bad disk) on the leader, then administrator intervention will be required to bring the cluster back online since the only remaining replica will be unable to get elected.
Attachments
Attachments
Issue Links
- is blocked by
-
KUDU-933 Master should delete replicas that are no longer part of a tablet config
- Resolved
-
KUDU-1129 Add Atomic CAS option to DeleteTablet()
- Resolved
- is duplicated by
-
KUDU-870 Detect and replace failed replicas at the Master level
- Resolved
- relates to
-
KUDU-1097 Higher availability re-replication support
- Resolved