Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1407

Leader should evict a failed follower stuck in the TABLET_NOT_RUNNING state

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 1.5.0
    • Component/s: consensus
    • Labels:
      None
    • Target Version/s:

      Description

      It seems like, if the leader gets an error from one of its followers because the tablet is not running, it considers this replica to be 'unresponsive'. If this happens for 5 minutes, it will evict that follower to try to create a new replica.

      This can cause problems at cluster startup time when there is a lot of data and a cold disk cache - the startup bootstrap process might be more than five minutes and leaders might end up evicting followers that are perfectly healthy (just in the process of coming up).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                awong Andrew Wong
                Reporter:
                tlipcon Todd Lipcon
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: