Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3121

Fix flaky metrics tests in storm-core

    XMLWordPrintableJSON

Details

    Description

      The tests are flaky, but only rarely fail. I've only seen them fail on Travis when Travis is under load.

      Example failures:

      classname: org.apache.storm.metrics-test / testname: test-custom-metric-with-multi-tasks
      expected: (clojure.core/= [1 0 0 0 0 0 2] (clojure.core/subvec (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "2" "my-custom-metric") 0 N__3207__auto__))
        actual: (not (clojure.core/= [1 0 0 0 0 0 2] [1 0 0 0 0 0 0]))
            at: test_runner.clj:105
      
      classname: org.apache.storm.metrics-test / testname: test-builtin-metrics-2
      expected: (clojure.core/= [1 1] (clojure.core/subvec (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "myspout" "__emit-count/default") 0 N__3207__auto__))
        actual: (not (clojure.core/= [1 1] [1 0]))
            at: test_runner.clj:105
      

      The problem is that the tests increment metrics counters in the executor async loops, then expect the counters to end up in exact metrics buckets. The creation of a bucket is triggered by the metrics timer. The timer is included in time simulation and LocalCluster.waitForIdle, but the executor async loop isn't. There isn't any guarantee that the executor async loop gets to run when the test does a sequence like

      Time.advanceClusterTime
      cluster.waitForIdle
      

      because the waitForIdle check doesn't know about the executor async loop.

      Attachments

        Activity

          People

            srdo Stig Rohde Døssing
            srdo Stig Rohde Døssing
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 10m
                3h 10m