Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2367

Leader replica sometimes reports follower's health status as FAILED instead of FAILED_UNRECOVERABLE

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.7.0, 1.8.0
    • 1.8.0, 1.7.1
    • tserver
    • None

    Description

      If a leader tablet replica detects that its follower falls behind the WAL segment GC threshold after the unavailability interval (defined by the --follower_unavailable_considered_failed_sec flag), it never reports the status of the follower as FAILED_UNRECOVERABLE to the catalog manager, and continues reporting FAILED instead. In configurations where the tablet replication factor equals to the total number of tablet servers in the cluster, that leads to situations when the tablet cannot be automatically recovered for a long time. In particular, such situations last until a new leader is elected or corresponding tablet servers are restarted.

      Attachments

        Activity

          People

            aserbin Alexey Serbin
            aserbin Alexey Serbin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: