Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
We've seen nodes with high loads that timeout health checks periodically. This leads to killing workers unnecessarily.
I'd like an option to not fail when timeouts occur, and to have a metric to track when these occur.
Attachments
Issue Links
- links to