Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
Discovery Commons 1.0.4
-
None
Description
The ViewStateManagerImpl delegates the feature of delaying a TOPOLOGY_CHANGED event a few seconds to avoid too frequent switching when multiple instances come and go to the MinEventDelayHandler. When the ViewStateManagerImpl is stopped however (via handleDeactivated), then this is not noticed by the MinEventDelayHandler. With the result that it might happily continue in the following loop: triggerAsyncDelaying schedules a runnable to be triggered after 3 seconds by default. When that is triggered, it checks the state of the view. If the view is not current (which is typically the case after deactivation), then it reschedules itself - thinking that eventually the view would become current/stable again. This is normally the case and a good way to guarantee that eventually the view change can be announced. However after deactivation this will likely not occur and thus the MinEventDelayHandler would just spin happily onwards in this 3sec-loop forever, or until the ViewStateManager is reactivated.
For normal operations this behavior is not a problem at all (thus priority minor)
However, for testing this has the side-effect, that this loop will span into subsequent tests - and potentially messing with it.
One way of 'messing' has been noticed in the following failing test on jenkins:
https://builds.apache.org/job/sling-trunk-1.7/org.apache.sling$org.apache.sling.discovery.impl/2751/testReport/org.apache.sling.discovery.impl.common.heartbeat/HeartbeatTest/testPartitioning/
java.lang.AssertionError: expected:<TOPOLOGY_INIT> but was:<TOPOLOGY_CHANGED> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.doTestPartitioning(HeartbeatTest.java:285) at org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.testPartitioning(HeartbeatTest.java:143)
where one 'issue heartbeat' operation triggered from doTestPartitioning lasted over 5 seconds:
17.11.2015 23:50:37.033 *DEBUG* [main] DiscoveryServiceImpl: updateProperties: done.
17.11.2015 23:50:37.033 *DEBUG* [main] HeartbeatHandler: issueClusterLocalHeartbeat: storing cluster-local heartbeat to repository for fe88cbb1-f967-48c5-a58d-30fd137909cc
17.11.2015 23:50:42.707 *DEBUG* [main] HeartbeatHandler: issueConnectorPings: not issuing remote heartbeat yet, startup not yet finished
17.11.2015 23:50:42.724 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc: analyzeVotings: start. slingId: fe88cbb1-f967-48c5-a58d-30fd137909cc
17.11.2015 23:50:43.081 *DEBUG* [main] VotingHelper: listVotings: votings found: 0
17.11.2015 23:50:43.081 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc: analyzeVotings: no ongoing votings at the moment. done.
17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckView: established view matches with expected.
17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckViewWith: no pending nor winning votes. view is fine. we're all happy.
and the only explanation found so far was that the thread-pool that should normally process background jobs was busy with all those scheduled jobs that were left over from previous jobs.