Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2800

Avoid 'unintended' re-replication of long-bootstrapping tablet replicas

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.9.1, 1.10.0
    • Fix Version/s: 1.11.0, 1.11.1
    • Component/s: consensus, tserver
    • Labels:

      Description

      As implemented in
      https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576 , the logic for tracking 'health' of tablet replicas cannot differentiate between bootstrapping and failed replicas.

      As a result, if a tablet replica is bootstrapping for times longer than the interval specified by --follower_unavailable_considered_failed_sec run-time flag, the system can start the process of re-replication of the tablet replica elsewhere.

      One option might be sending a specific error with ConsensusResponsePB in response to a Raft message sent by a leader replica, maybe adding extra information on the current progress of the replica bootstrap process. As soon as such bootstrapping follower replica isn't failing behind leader's WAL GC threshold, the leader replica will not evict it. But if the bootstrapping follower replica falls behind the WAL GC threshold, leader replica will evict it and the system will start re-replicating it elsewhere. In cases when the amount of Raft transactions for a tablet is low, this approach would allow for longer bootstrapping times of tablet replicas. That might be especially beneficial in cases when a tablet server with IO-heavy tablet replicas is being restarted, and there aren't many incoming updates/inserts for tablets hosted by the tablet server.

      However, the approach above requires the Raft consensus object for a bootstrapping replica to be at least partially functional, so it entails reading at least some information about a replica from the on-disk consensus metadata prior to proper bootstrapping of a tablet replica by a tablet server.

        Attachments

          Activity

            People

            • Assignee:
              vladimir_committer Vladimir Verjovkin
              Reporter:
              aserbin Alexey Serbin

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment