Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Discovery Impl 1.2.0
-
None
Description
SLING-5195 introduced a self-check that was monitoring if the HeartbeatHandler was properly storing the heartbeats regularly. This is done because there are different reasons why that might not be the case, eg: the HeartbeatHandler could be blocked because of another long-running-commit happening locally - or it might be blocked due to thread-pool-exhaustion - or perhaps something yet different.
The check was setting off an alarm when the time-since-last-heartbeat was bigger than a heartbeatTimeout. This however is not sufficient. The comparison should be much more aggressive. It should compare against a heartbeatTimeout minus 2 times heartbeatInterval to have enough safety margin. 2 times because 1 time is actually the very minimum: this background check only runs every heartbeatInterval, so in the worst case it could run just heartbeatInterval many seconds before the timeout hits - and still be too late by a fraction. So 1 is the very minimum. The 2 is actually adding a safety margin of 1 heartbeatInterval only.
Note: this also means that you should configure the heartbeatTimeout at least 4-5 times the heartbeatInterval.
Attachments
Issue Links
- is related to
-
SLING-5284 use dedicate thread instead of scheduler
- Closed