Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2367

Leader replica sometimes reports follower's health status as FAILED instead of FAILED_UNRECOVERABLE

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.0, 1.8.0
    • Fix Version/s: 1.8.0, 1.7.1
    • Component/s: tserver
    • Labels:
      None

      Description

      If a leader tablet replica detects that its follower falls behind the WAL segment GC threshold after the unavailability interval (defined by the --follower_unavailable_considered_failed_sec flag), it never reports the status of the follower as FAILED_UNRECOVERABLE to the catalog manager, and continues reporting FAILED instead. In configurations where the tablet replication factor equals to the total number of tablet servers in the cluster, that leads to situations when the tablet cannot be automatically recovered for a long time. In particular, such situations last until a new leader is elected or corresponding tablet servers are restarted.

        Attachments

          Activity

            People

            • Assignee:
              aserbin Alexey Serbin
              Reporter:
              aserbin Alexey Serbin
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: