Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
ghx-label-9
Description
For debugging heartbeat failures (or non-failures) it would be useful to log enough information to infer the current state of the failure detector from logs. Specifically:
- Upon a failure, we should log the number of consecutive failures according to the failure detector. And also maybe how many failures remain until it's considered to be failed.
- We should log when the failure count is reset to 0 by a successful heartbeat.
Currently if there are occasional failures it's hard to tell with certainty whether it was reset correctly.