Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.7.0, 1.8.0
-
None
Description
If a leader tablet replica detects that its follower falls behind the WAL segment GC threshold after the unavailability interval (defined by the --follower_unavailable_considered_failed_sec flag), it never reports the status of the follower as FAILED_UNRECOVERABLE to the catalog manager, and continues reporting FAILED instead. In configurations where the tablet replication factor equals to the total number of tablet servers in the cluster, that leads to situations when the tablet cannot be automatically recovered for a long time. In particular, such situations last until a new leader is elected or corresponding tablet servers are restarted.