[STORM-3121] Fix flaky metrics tests in storm-core - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0, 1.2.3
Component/s: storm-core
Labels:
- pull-request-available

Description

The tests are flaky, but only rarely fail. I've only seen them fail on Travis when Travis is under load.

Example failures:

classname: org.apache.storm.metrics-test / testname: test-custom-metric-with-multi-tasks
expected: (clojure.core/= [1 0 0 0 0 0 2] (clojure.core/subvec (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "2" "my-custom-metric") 0 N__3207__auto__))
  actual: (not (clojure.core/= [1 0 0 0 0 0 2] [1 0 0 0 0 0 0]))
      at: test_runner.clj:105

classname: org.apache.storm.metrics-test / testname: test-builtin-metrics-2
expected: (clojure.core/= [1 1] (clojure.core/subvec (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "myspout" "__emit-count/default") 0 N__3207__auto__))
  actual: (not (clojure.core/= [1 1] [1 0]))
      at: test_runner.clj:105

The problem is that the tests increment metrics counters in the executor async loops, then expect the counters to end up in exact metrics buckets. The creation of a bucket is triggered by the metrics timer. The timer is included in time simulation and LocalCluster.waitForIdle, but the executor async loop isn't. There isn't any guarantee that the executor async loop gets to run when the test does a sequence like

Time.advanceClusterTime
cluster.waitForIdle

because the waitForIdle check doesn't know about the executor async loop.

Attachments

Issue Links

links to

GitHub Pull Request #2735

GitHub Pull Request #2778

Activity

People

Assignee:: Stig Rohde Døssing

Reporter:: Stig Rohde Døssing

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Jun/18 10:44

Updated:: 29/Jul/18 12:53

Resolved:: 26/Jun/18 08:00

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3h 10m