Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Discovery Impl 1.1.0
-
None
Description
VotingHandler.analyzeVotings has the risk of running into a busy vote loop during the duration of an /ongoingVotings:
- analyzeVotings is invoked either when something changes in /var/discovery/impl/ongoingVotings or as part of a heartbeat
- as part of this, it figures out the ongoingVotings, decides on which it has to vote (should there be more than 1, and it doesn't vote if it was the initiator) then potentially does a vote(,true) on it
- in vote() it not only sets the vote property accordingly (true or false) - it additionally, since SLING-3434, also sets the votedAt property (timestamp)
- since the above is a change in /ongoingVotings, this will trigger an observation event - which again triggers the analyzeVotings to be called, which will again find an ongoing vote, vote upon, trigger an observation event etc. Resulting in an endless loop that involves repository and an observation handler
- now the above loop only occurs since the introduction of the votedAt property - as that is changing on each iteration. Without that, voting again with the same boolean would not result in an observation event and the loop would not happen at all.
- in any case, the loop lasts at maximum until the initiator finally got all the votes and can promote the vote to an /establishedView. This should typically be a fast operation - but if the cluster is under heavy load and experiences delays for some reason, this busy loop can last a little while.
So this loop is a regression introduced with SLING-3434.
Attachments
Issue Links
- is broken by
-
SLING-3434 Make intra-cluster discovery-heartbeats independent from machine clock differences
- Closed