Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2809

Integration test is failing consistently and topologies sometimes fail to start workers

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • None

    Description

      The integration test has been failing fairly consistently since https://github.com/apache/storm/pull/2363. I tried running the test outside a VM with a locally installed Storm setup, and it has failed every time for me.

      Most runs seem to fail in ways that make it look like the integration test is just flaky (e.g. tuple windows not matching the calculated window), but in at least a few tests I saw the topology get submitted to Nimbus followed by about 3 minutes of nothing happening. The workers never started and the supervisor didn't seem aware of the scheduling. The only evidence that the topology was submitted was in the Nimbus log. This still happens even if the test topologies are killed with a timeout of 0, so there should be slots free for the next test immediately.

      I tried reverting https://github.com/apache/storm/pull/2363 and it seems to make the integration test pass much more often. Over 5 runs there was still an instance of a supervisor failing to start the workers, but the other 4 passed.

      We should try to fix whatever is causing the supervisor to fail to start workers, and get the integration test more stable.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            revans2 Robert Joseph Evans
            srdo Stig Rohde Døssing
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m

                Slack

                  Issue deployment