Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
Description
The tests are flaky, but only rarely fail. I've only seen them fail on Travis when Travis is under load.
Example failures:
classname: org.apache.storm.metrics-test / testname: test-custom-metric-with-multi-tasks expected: (clojure.core/= [1 0 0 0 0 0 2] (clojure.core/subvec (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "2" "my-custom-metric") 0 N__3207__auto__)) actual: (not (clojure.core/= [1 0 0 0 0 0 2] [1 0 0 0 0 0 0])) at: test_runner.clj:105
classname: org.apache.storm.metrics-test / testname: test-builtin-metrics-2 expected: (clojure.core/= [1 1] (clojure.core/subvec (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "myspout" "__emit-count/default") 0 N__3207__auto__)) actual: (not (clojure.core/= [1 1] [1 0])) at: test_runner.clj:105
The problem is that the tests increment metrics counters in the executor async loops, then expect the counters to end up in exact metrics buckets. The creation of a bucket is triggered by the metrics timer. The timer is included in time simulation and LocalCluster.waitForIdle, but the executor async loop isn't. There isn't any guarantee that the executor async loop gets to run when the test does a sequence like
Time.advanceClusterTime cluster.waitForIdle
because the waitForIdle check doesn't know about the executor async loop.