Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1407

Leader should evict a failed follower stuck in the TABLET_NOT_RUNNING state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.8.0
    • 1.5.0
    • consensus
    • None

    Description

      It seems like, if the leader gets an error from one of its followers because the tablet is not running, it considers this replica to be 'unresponsive'. If this happens for 5 minutes, it will evict that follower to try to create a new replica.

      This can cause problems at cluster startup time when there is a lot of data and a cold disk cache - the startup bootstrap process might be more than five minutes and leaders might end up evicting followers that are perfectly healthy (just in the process of coming up).

      Attachments

        Issue Links

          Activity

            People

              awong Andrew Wong
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: