Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3321

Tests are flaky due to long timeouts in Nimbus and supervisor when using LocalCluster

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • None

    Description

      Tests will sometimes fail with timeout when using e.g. Testing.completeTopology.

      The issue is that the timeout is 10 seconds, and Nimbus and the supervisor both have timers that monitor for new deployments that are also set to 10 seconds. This causes tests to time out because a lot of the test time is wasted waiting for Nimbus/the supervisors to catch that the test topology is deployed.

      We should reduce these timeouts to their minimums.

      There is also a race in Nimbus that can cause test failures

      2019-01-21 02:00:19.587 [main] WARN org.apache.storm.daemon.nimbus.Nimbus - Topology submission exception. (topology name='topologytest-45f5ad59-ec16-45a4-ba4a-eea992411cc1')
      java.lang.RuntimeException: not a leader, current leader is NimbusInfo

      Unknown macro: {host='DESKTOP-AGC8TKM', port=6627, isLeader=true}

      at org.apache.storm.daemon.nimbus.Nimbus.assertIsLeader(Nimbus.java:1525) ~[classes/:?]
      at org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2982) ~[classes/:?]
      at org.apache.storm.daemon.nimbus.Nimbus.submitTopology(Nimbus.java:2965) ~[classes/:?]
      at org.apache.storm.LocalCluster.submitTopology(LocalCluster.java:444) ~[classes/:?]
      at org.apache.storm.LocalCluster.submitTopology(LocalCluster.java:125) ~[classes/:?]
      at org.apache.storm.Testing.completeTopology(Testing.java:424) ~[classes/:?]

      The issue is that Nimbus has to acquire leadership in order to submit topologies, but LocalCluster doesn't wait for the Nimbus instance it creates to gain leadership.

      We should make LocalCluster wait for Nimbus to gain leadership.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            srdo Stig Rohde Døssing Assign to me
            srdo Stig Rohde Døssing
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h 40m
              1h 40m

              Slack

                Issue deployment