Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.7.0
-
None
Description
Currently monitoring systems can report that the Kudu cluster is perfectly healthy when in fact some tablet has gotten "stuck" with no leader (eg due to some network connectivity problem or a bug). If we exposed a numeric metric on a tablet indicating the time since a replica was healthy, or number of failed election attempts, etc, we could easily monitor for this case.
Attachments
Issue Links
- is related to
-
KUDU-2288 Client should fail fast upon access to an unavailable tablet
- In Progress