Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
Description
Tests will sometimes fail with timeout when using e.g. Testing.completeTopology.
The issue is that the timeout is 10 seconds, and Nimbus and the supervisor both have timers that monitor for new deployments that are also set to 10 seconds. This causes tests to time out because a lot of the test time is wasted waiting for Nimbus/the supervisors to catch that the test topology is deployed.
We should reduce these timeouts to their minimums.
There is also a race in Nimbus that can cause test failures
2019-01-21 02:00:19.587 [main] WARN org.apache.storm.daemon.nimbus.Nimbus - Topology submission exception. (topology name='topologytest-45f5ad59-ec16-45a4-ba4a-eea992411cc1')
java.lang.RuntimeException: not a leader, current leader is NimbusInfoUnknown macro: {host='DESKTOP-AGC8TKM', port=6627, isLeader=true}at org.apache.storm.daemon.nimbus.Nimbus.assertIsLeader(Nimbus.java:1525) ~[classes/:?]
at org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2982) ~[classes/:?]
at org.apache.storm.daemon.nimbus.Nimbus.submitTopology(Nimbus.java:2965) ~[classes/:?]
at org.apache.storm.LocalCluster.submitTopology(LocalCluster.java:444) ~[classes/:?]
at org.apache.storm.LocalCluster.submitTopology(LocalCluster.java:125) ~[classes/:?]
at org.apache.storm.Testing.completeTopology(Testing.java:424) ~[classes/:?]
The issue is that Nimbus has to acquire leadership in order to submit topologies, but LocalCluster doesn't wait for the Nimbus instance it creates to gain leadership.
We should make LocalCluster wait for Nimbus to gain leadership.
Attachments
Issue Links
- links to