Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Feature Complete
    • None
    • consensus
    • None

    Description

      We want to add initial support for re-replication for the beta release.

      Design:

      1. When a leader detects that a follower has fallen behind to the point that it can't catch up, it will trigger a "remove server" config change.
      2. When the master gets a report from a tablet and sees that the number of replicas in the config is less than the table's desired replication, it will itself start a task to create a new replica.

      Details:

      1. Let's start with choosing randomly among any tservers that have a most recent heartbeat in the last 3 heartbeat periods, as a reasonable proxy for "live tservers". Later we can do something smarter like "power of two choices" or load-aware placement. Random placement isn't optimal, but also has the least risk of causing weird emergent behavior.
      2. The master task will call AddServer() to add the newly selected replica.

      Additional possible refinements:

      1. We should also trigger this same process if the leader detects that it hasn't had a successful request send to a follower after N heartbeat periods.
      2. We should build in some safety net here in the case that the follower is actually still in the middle of bootstrapping and making progress - otherwise we could flap.
      3. We probably want to prohibit the leader from doing this unless it knows it's still within its "lease period". Otherwise, we might too easily drop to a 1-node config if we get to a 2-node config and the leader itself has some issue.

      Pros:

      • Fairly simple and easy approach to re-replication.

      Cons:

      • Availability is less than optimal, for example if a follower is slow enough to fall behind the log, causing the leader to remove it from the Raft config, and there is a simultaneous leader failure (i.e. bad disk) on the leader, then administrator intervention will be required to bring the cluster back online since the only remaining replica will be unable to get elected.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mpercy Mike Percy
            mpercy Mike Percy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment