Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2610

Spout throtteling metrics are unusable

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0, 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: storm-client, storm-core
    • Labels:
      None

      Description

      When helping someone debug an issue with backpressure I realized that the metrics we are collecting in the spout are mistakenly being multiplied by the rate, even though we are not sub-sampling them. This results in the values being, by default, 20 times higher then they should be. Thinking about how I would use the metrics to debug an issue also showed that some of them. skipped-max-spout and skipped-throttle correspond to about 1 ms of sleep, but skipped-inactive corresponds to about 100 ms of sleep. And the 1 ms sleep is configurable so it could be different from one topology to another, and even the code around it is pluggable, so it could be doing anything from not sleeping to sleeping a random amount of time.

      I think we just need to scrap what we have been doing and record how long we sleep for and use that as the metric instead.

      These metrics also don't appear to be documented anywhere so I am going to change what they mean and document them to actually be useful, and correct.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                revans2 Robert Joseph Evans
                Reporter:
                revans2 Robert Joseph Evans
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h