Uploaded image for project: 'Sling'
  1. Sling
  2. SLING-5310

MinEventDelayHandler should have a cancel method

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • Discovery Commons 1.0.4
    • Discovery Commons 1.0.6
    • Extensions
    • None

    Description

      The ViewStateManagerImpl delegates the feature of delaying a TOPOLOGY_CHANGED event a few seconds to avoid too frequent switching when multiple instances come and go to the MinEventDelayHandler. When the ViewStateManagerImpl is stopped however (via handleDeactivated), then this is not noticed by the MinEventDelayHandler. With the result that it might happily continue in the following loop: triggerAsyncDelaying schedules a runnable to be triggered after 3 seconds by default. When that is triggered, it checks the state of the view. If the view is not current (which is typically the case after deactivation), then it reschedules itself - thinking that eventually the view would become current/stable again. This is normally the case and a good way to guarantee that eventually the view change can be announced. However after deactivation this will likely not occur and thus the MinEventDelayHandler would just spin happily onwards in this 3sec-loop forever, or until the ViewStateManager is reactivated.

      For normal operations this behavior is not a problem at all (thus priority minor)

      However, for testing this has the side-effect, that this loop will span into subsequent tests - and potentially messing with it.

      One way of 'messing' has been noticed in the following failing test on jenkins:
      https://builds.apache.org/job/sling-trunk-1.7/org.apache.sling$org.apache.sling.discovery.impl/2751/testReport/org.apache.sling.discovery.impl.common.heartbeat/HeartbeatTest/testPartitioning/

      java.lang.AssertionError: expected:<TOPOLOGY_INIT> but was:<TOPOLOGY_CHANGED>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:743)
      	at org.junit.Assert.assertEquals(Assert.java:118)
      	at org.junit.Assert.assertEquals(Assert.java:144)
      	at org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.doTestPartitioning(HeartbeatTest.java:285)
      	at org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.testPartitioning(HeartbeatTest.java:143)
      

      where one 'issue heartbeat' operation triggered from doTestPartitioning lasted over 5 seconds:

      17.11.2015 23:50:37.033 *DEBUG* [main] DiscoveryServiceImpl: updateProperties: done.
      17.11.2015 23:50:37.033 *DEBUG* [main] HeartbeatHandler: issueClusterLocalHeartbeat: storing cluster-local heartbeat to repository for fe88cbb1-f967-48c5-a58d-30fd137909cc
      17.11.2015 23:50:42.707 *DEBUG* [main] HeartbeatHandler: issueConnectorPings: not issuing remote heartbeat yet, startup not yet finished
      17.11.2015 23:50:42.724 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc: analyzeVotings: start. slingId: fe88cbb1-f967-48c5-a58d-30fd137909cc
      17.11.2015 23:50:43.081 *DEBUG* [main] VotingHelper: listVotings: votings found: 0
      17.11.2015 23:50:43.081 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc: analyzeVotings: no ongoing votings at the moment. done.
      17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckView: established view matches with expected.
      17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckViewWith: no pending nor winning votes. view is fine. we're all happy.
      

      and the only explanation found so far was that the thread-pool that should normally process background jobs was busy with all those scheduled jobs that were left over from previous jobs.

      Attachments

        Activity

          People

            stefanegli Stefan Egli
            stefanegli Stefan Egli
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: