Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8785

Use Metrics library for core metrics

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.1
    • Fix Version/s: 6.4, 7.0
    • Component/s: metrics

      Description

      The Metrics library (https://dropwizard.github.io/metrics/3.1.0/) is a well-known way to track metrics about applications.

      In SOLR-1972, latency percentile tracking was added. The comment list is long, so here’s my synopsis:

      1. An attempt was made to use the Metrics library
      2. That attempt failed due to a memory leak in Metrics v2.1.1
      3. Large parts of Metrics were then copied wholesale into the org.apache.solr.util.stats package space and that was used instead.

      Copy/pasting Metrics code into Solr may have been the correct solution at the time, but I submit that it isn’t correct any more.
      The leak in Metrics was fixed even before SOLR-1972 was released, and by copy/pasting a subset of the functionality, we miss access to other important things that the Metrics library provides, particularly the concept of a Reporter. (https://dropwizard.github.io/metrics/3.1.0/manual/core/#reporters)

      Further, Metrics v3.0.2 is already packaged with Solr anyway, because it’s used in two contrib modules. (map-reduce and morphines-core)

      I’m proposing that:

      1. Metrics as bundled with Solr be upgraded to the current v3.1.2
      2. Most of the org.apache.solr.util.stats package space be deleted outright, or gutted and replaced with simple calls to Metrics. Due to the copy/paste origin, the concepts should mostly map 1:1.

      I’d further recommend a usage pattern like:
      SharedMetricRegistries.getOrCreate(System.getProperty(“solr.metrics.registry”, “solr-registry”))

      There are all kinds of areas in Solr that could benefit from metrics tracking and reporting. This pattern allows diverse areas of code to track metrics within a single, named registry. This well-known-name then becomes a handle you can use to easily attach a Reporter and ship all of those metrics off-box.

      1. SOLR_8775_rates_per_minute_fix.patch
        3 kB
        Shalin Shekhar Mangar
      2. SOLR-8785.patch
        78 kB
        Shalin Shekhar Mangar
      3. SOLR-8785.patch
        76 kB
        Shalin Shekhar Mangar
      4. SOLR-8785.patch
        77 kB
        Shalin Shekhar Mangar
      5. SOLR-8785-increment.patch
        9 kB
        Christine Poerschke

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user randomstatistic opened a pull request:

          https://github.com/apache/lucene-solr/pull/25

          SOLR-8785: Use Metrics library for core metrics

          There were three main areas that used the copied classes in org.apache.solr.util.stats:

          • AnalyticsStatisticsCollector
          • Overseer.Stats
          • RequestHandlerBase

          This patch adds depreciation tags to all the copied classes, and also replaces all usage of those classes with classes from the Metrics library.
          I added one new class (org.apache.solr.util.stats.Metrics) to provide some common access patterns for metrics gathering.

          This patch only adds Registry-based tracking to RequestHandlerBase, although all three areas are a fit for it. The effect is that all one needs to do is add a Reporter to the SharedMetricRegistry named “solr.registry.requesthandler” and all named request handler stats will be exported automatically.

          Compatibility notes:

          • The “totalTime” stat has been deleted from all three areas. This never seemed very useful, and Metrics didn’t support it in the Timer class, so it would have required some extra code to keep.
          • RequestHandler stats are now persistent, and will no longer reset on reload.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/randomstatistic/lucene-solr metrics_lib

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/25.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #25


          commit 77ba4704399ecd5121a6941a3a75c1294172ed21
          Author: Jeff Wartes <jeff@lanndark.com>
          Date: 2016-03-16T18:27:35Z

          SOLR-8785 - Upgrade Metrics lib

          commit 9ad3a8179ac446f5820e051802a37bf8b2ba911b
          Author: Jeff Wartes <jeff@lanndark.com>
          Date: 2016-03-18T02:46:49Z

          SOLR-8785 - Use the Metrics lib instead of the old classes from the org.apache.solr.util.stats package space.

          commit 6ee11c807aa7432ec02f9ad63aefc7487a02566a
          Author: Jeff Wartes <jeff@lanndark.com>
          Date: 2016-03-18T02:55:33Z

          SOLR-8785 - Use persistent, reportable timers for named request handlers


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user randomstatistic opened a pull request: https://github.com/apache/lucene-solr/pull/25 SOLR-8785 : Use Metrics library for core metrics There were three main areas that used the copied classes in org.apache.solr.util.stats: AnalyticsStatisticsCollector Overseer.Stats RequestHandlerBase This patch adds depreciation tags to all the copied classes, and also replaces all usage of those classes with classes from the Metrics library. I added one new class (org.apache.solr.util.stats.Metrics) to provide some common access patterns for metrics gathering. This patch only adds Registry-based tracking to RequestHandlerBase, although all three areas are a fit for it. The effect is that all one needs to do is add a Reporter to the SharedMetricRegistry named “solr.registry.requesthandler” and all named request handler stats will be exported automatically. Compatibility notes: The “totalTime” stat has been deleted from all three areas. This never seemed very useful, and Metrics didn’t support it in the Timer class, so it would have required some extra code to keep. RequestHandler stats are now persistent, and will no longer reset on reload. You can merge this pull request into a Git repository by running: $ git pull https://github.com/randomstatistic/lucene-solr metrics_lib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/25.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #25 commit 77ba4704399ecd5121a6941a3a75c1294172ed21 Author: Jeff Wartes <jeff@lanndark.com> Date: 2016-03-16T18:27:35Z SOLR-8785 - Upgrade Metrics lib commit 9ad3a8179ac446f5820e051802a37bf8b2ba911b Author: Jeff Wartes <jeff@lanndark.com> Date: 2016-03-18T02:46:49Z SOLR-8785 - Use the Metrics lib instead of the old classes from the org.apache.solr.util.stats package space. commit 6ee11c807aa7432ec02f9ad63aefc7487a02566a Author: Jeff Wartes <jeff@lanndark.com> Date: 2016-03-18T02:55:33Z SOLR-8785 - Use persistent, reportable timers for named request handlers
          Hide
          elyograg Shawn Heisey added a comment - - edited

          RequestHandler stats are now persistent, and will no longer reset on reload.

          If I understand this correctly, then I don't think we want to do this. Perhaps I don't understand it correctly?

          There needs to be a way to reset the statistics to zero. Reloading the core is currently the way that I do this. (I'm not running cloud)

          I don't want you to think I'm completely against persistence. The idea is VERY nice, but having it turned on by default could cause issues for users with existing workflows. I think the way to handle this particular feature is:

          • Introduce a new config parameter to enable persistence. Default the parameter to "false".
          • Discuss a new default of "true" in 7.0. If consensus is to change the default in master (which probably is a good idea), then enable it in 6.x example configs so it most brand-new setups will use it.
          Show
          elyograg Shawn Heisey added a comment - - edited RequestHandler stats are now persistent, and will no longer reset on reload. If I understand this correctly, then I don't think we want to do this. Perhaps I don't understand it correctly? There needs to be a way to reset the statistics to zero. Reloading the core is currently the way that I do this. (I'm not running cloud) I don't want you to think I'm completely against persistence. The idea is VERY nice, but having it turned on by default could cause issues for users with existing workflows. I think the way to handle this particular feature is: Introduce a new config parameter to enable persistence. Default the parameter to "false". Discuss a new default of "true" in 7.0. If consensus is to change the default in master (which probably is a good idea), then enable it in 6.x example configs so it most brand-new setups will use it.
          Hide
          jwartes Jeff Wartes added a comment -

          Providing a way to reset is fine, but I've never been convinced a reload was the right way to do it. We're measuring stats about the RequestHandler, right? Why would we want to lose and rebuild our (weighted-moving-average) p99 latency measurement just because we changed our commit interval or something?

          Show
          jwartes Jeff Wartes added a comment - Providing a way to reset is fine, but I've never been convinced a reload was the right way to do it. We're measuring stats about the RequestHandler, right? Why would we want to lose and rebuild our (weighted-moving-average) p99 latency measurement just because we changed our commit interval or something?
          Hide
          elyograg Shawn Heisey added a comment -

          Perhaps it's not the right way, but currently it's the only way I know of.

          In master, defaulting to persistence is probably an excellent option, especially if there is a reset option. I just worry about causing problems for existing 6.x users.

          So, combining everything into one coherent plan:

          • Create a config option for 6.x to enable persistence, that defaults to false. Set it to true in 6.x example configs.
          • Add CoreAdmin and CollectionsAdmin actions to reset stats to 6.x and master..
          • In master, the option to enable persistence will not exist. Persistence will always be enabled. Users will be able to use the reset action.
          Show
          elyograg Shawn Heisey added a comment - Perhaps it's not the right way, but currently it's the only way I know of. In master, defaulting to persistence is probably an excellent option, especially if there is a reset option. I just worry about causing problems for existing 6.x users. So, combining everything into one coherent plan: Create a config option for 6.x to enable persistence, that defaults to false. Set it to true in 6.x example configs. Add CoreAdmin and CollectionsAdmin actions to reset stats to 6.x and master.. In master, the option to enable persistence will not exist. Persistence will always be enabled. Users will be able to use the reset action.
          Hide
          jwartes Jeff Wartes added a comment -

          For the record, it looks like I wrote this patch against master, around about version 6.1.
          I recall I had some concern at the time that the metrics namespace generation was too flexible (complicated), so that's something to look at.

          Show
          jwartes Jeff Wartes added a comment - For the record, it looks like I wrote this patch against master, around about version 6.1. I recall I had some concern at the time that the metrics namespace generation was too flexible (complicated), so that's something to look at.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          I have an interest in this. I'll take a look at the changes.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - I have an interest in this. I'll take a look at the changes.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -
          • This patch applies on master.
          • It also fixes many errors found by precommit.
          • The metrics library has removed Timer.getSum method so the "totalTime" metric from OverseerStatus API cannot be supported. I think we can live without it. I'll note that in CHANGES.txt
          • The classes copied from codahale-metrics have been removed entirely instead of being deprecated.
          • Although I've kept the Metrics class verbatim from the pull request in this patch. I'm inclined to remove most of it since most methods other than addTimerMetrics aren't being used anywhere except in MetricsTest. I'd like to evaluate the changes proposed by Alan in SOLR-4735 instead.
          Show
          shalinmangar Shalin Shekhar Mangar added a comment - This patch applies on master. It also fixes many errors found by precommit. The metrics library has removed Timer.getSum method so the "totalTime" metric from OverseerStatus API cannot be supported. I think we can live without it. I'll note that in CHANGES.txt The classes copied from codahale-metrics have been removed entirely instead of being deprecated. Although I've kept the Metrics class verbatim from the pull request in this patch. I'm inclined to remove most of it since most methods other than addTimerMetrics aren't being used anywhere except in MetricsTest. I'd like to evaluate the changes proposed by Alan in SOLR-4735 instead.
          Hide
          jwartes Jeff Wartes added a comment -

          I'm trying to remember now, but I think the other stuff in that Metrics was largely scaffolding for what I thought would be another jira for exposing a way to register one or more Reporters. I was trying to focus on deleting the copied code here, but the ability to use Reporters was what really drove all this for me.

          As I recall it, my thought was that each logical section of code that tracked metrics would use the namedTimer(String name, String registry) api, (using the section name for the second parameter) and the user could then use properties that manipulate the registry names to choose which sections shared which registries.
          Then, the (say) Overseer could use one registry, and the AnalyticsStatisticsCollector could use another, and you could attach different Reporters to each. Or, you could have both of those places use the same registry, and therefore the same Reporter.

          This is all contingent, naturally, on the registry name being the thing you use to find the registry you want to attach a Reporter to, which could be debated as a best practice.

          Show
          jwartes Jeff Wartes added a comment - I'm trying to remember now, but I think the other stuff in that Metrics was largely scaffolding for what I thought would be another jira for exposing a way to register one or more Reporters. I was trying to focus on deleting the copied code here, but the ability to use Reporters was what really drove all this for me. As I recall it, my thought was that each logical section of code that tracked metrics would use the namedTimer(String name, String registry) api, (using the section name for the second parameter) and the user could then use properties that manipulate the registry names to choose which sections shared which registries. Then, the (say) Overseer could use one registry, and the AnalyticsStatisticsCollector could use another, and you could attach different Reporters to each. Or, you could have both of those places use the same registry, and therefore the same Reporter. This is all contingent, naturally, on the registry name being the thing you use to find the registry you want to attach a Reporter to, which could be debated as a best practice.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Thanks Jeff. I think the bulk of Metrics should be added after we've flushed out the new design for metrics collection either via namedTimer or the way Alan proposed in SOLR-4735. Right now, your/this patch is enough to remove the copied code which is the first step.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Thanks Jeff. I think the bulk of Metrics should be added after we've flushed out the new design for metrics collection either via namedTimer or the way Alan proposed in SOLR-4735 . Right now, your/this patch is enough to remove the copied code which is the first step.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi Jeff Wartes and Shalin Shekhar Mangar,

          in our team one of my colleagues (Kelvin Wong) is currently also working on metrics stuff i.e. we share your interest in this.

          The attached SOLR-8785-increment.patch takes your SOLR-8785.patch from earlier today and adds an adapted version of Kelvin's changes on top of it; summary of the differences:

          • TimerUtils.java as alternative to Metrics.java class
          • TimerUtilsTest to tie metric names to snapshot accessors
          • one import reorder (com before org) in RequestHandlerBase.java
          • observation: existing code uses both thPcRequestTime and thPctlRequestTime
          Show
          cpoerschke Christine Poerschke added a comment - Hi Jeff Wartes and Shalin Shekhar Mangar , in our team one of my colleagues (Kelvin Wong) is currently also working on metrics stuff i.e. we share your interest in this. The attached SOLR-8785 -increment.patch takes your SOLR-8785 .patch from earlier today and adds an adapted version of Kelvin's changes on top of it; summary of the differences: TimerUtils.java as alternative to Metrics.java class TimerUtilsTest to tie metric names to snapshot accessors one import reorder (com before org) in RequestHandlerBase.java observation: existing code uses both thPcRequestTime and thPctlRequestTime TimerUtils.java favours the RequestHandlerBase variant and thus changes the OverseerStatusCmd variant since the former seems more widely used.
          Hide
          wunder Walter Underwood added a comment -

          This is great!

          I'm going to try using IntrumentedHandler for Jetty from the Codahale library. That might be pretty straightforward, but we'll see.

          http://metrics.dropwizard.io/3.1.0/manual/jetty/

          Show
          wunder Walter Underwood added a comment - This is great! I'm going to try using IntrumentedHandler for Jetty from the Codahale library. That might be pretty straightforward, but we'll see. http://metrics.dropwizard.io/3.1.0/manual/jetty/
          Hide
          wunder Walter Underwood added a comment -

          We built a servlet filter for Tomcat to do this kind of thing. Not too complicated, but I'm not sure how general it is. It also has some New Relic custom metrics.

          We name the metrics as "corename.response.handlername". SolrJ goes to /select and sends the handlername in a qt parameter, so we conflate that with requests that go directly to the handler endpoint.

          Show
          wunder Walter Underwood added a comment - We built a servlet filter for Tomcat to do this kind of thing. Not too complicated, but I'm not sure how general it is. It also has some New Relic custom metrics. We name the metrics as "corename.response.handlername". SolrJ goes to /select and sends the handlername in a qt parameter, so we conflate that with requests that go directly to the handler endpoint.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Thanks Christine.

          There was another minor difference between Metrics.java and TimerUtils – Metrics.addTimerMetrics also adds "avgRequestsPerMinute", "5minRateRequestsPerMinute" and "15minRateRequestsPerMinute" which TimerUtils.addMetrics does not.

          This combined patch keeps the additional stats from Metrics but uses the TimerUtils name as it suggests this class being a utility class instead of a holistic Metrics solution which it isn't – yet.

          Thanks for fixing the "thPcRequestTime" vs "thPctlRequestTime" inconsistency.

          I'll commit this and open a few Jiras for further work.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Thanks Christine. There was another minor difference between Metrics.java and TimerUtils – Metrics.addTimerMetrics also adds "avgRequestsPerMinute", "5minRateRequestsPerMinute" and "15minRateRequestsPerMinute" which TimerUtils.addMetrics does not. This combined patch keeps the additional stats from Metrics but uses the TimerUtils name as it suggests this class being a utility class instead of a holistic Metrics solution which it isn't – yet. Thanks for fixing the "thPcRequestTime" vs "thPctlRequestTime" inconsistency. I'll commit this and open a few Jiras for further work.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Walter Underwood – This was going to be my follow-up issue. I opened SOLR-9788 for using the instrumented jetty classes in Solr.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Walter Underwood – This was going to be my follow-up issue. I opened SOLR-9788 for using the instrumented jetty classes in Solr.
          Hide
          jwartes Jeff Wartes added a comment -

          Understood, I'm all for incremental change, and I don't see "how to make a Reporter" as part of this issue. I will be slightly disappointed though, if we convert to the library without also providing a recommended access path for the use of that library. Gathering metrics you can't report on is useless, and one of the things I liked about the original patch was this:

              if(this.pluginInfo==null) {
                // if a request handler has a name, use a persistent, reportable timer under that name
                if (pluginInfo.name != null)
                  requestTimes = Metrics.namedTimer(Metrics.mkName(this.getClass(), pluginInfo.name), REGISTRY_NAME);
                this.pluginInfo = pluginInfo;
              }
          

          This meant that I automatically got access to all the relevant metrics for any named request handler, using any Reporters (Log, JMX, Graphite, whatever) I cared to attach. This, in turn, was only possible because all those metrics were in a well-defined and accessible location.

          Show
          jwartes Jeff Wartes added a comment - Understood, I'm all for incremental change, and I don't see "how to make a Reporter" as part of this issue. I will be slightly disappointed though, if we convert to the library without also providing a recommended access path for the use of that library. Gathering metrics you can't report on is useless, and one of the things I liked about the original patch was this: if ( this .pluginInfo== null ) { // if a request handler has a name, use a persistent, reportable timer under that name if (pluginInfo.name != null ) requestTimes = Metrics.namedTimer(Metrics.mkName( this .getClass(), pluginInfo.name), REGISTRY_NAME); this .pluginInfo = pluginInfo; } This meant that I automatically got access to all the relevant metrics for any named request handler, using any Reporters (Log, JMX, Graphite, whatever) I cared to attach. This, in turn, was only possible because all those metrics were in a well-defined and accessible location.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Yeah, we want to be able to hook different reporters – but as I said before, I do not want to rush into a solution without giving SOLR-4735 a proper thought. These kind of things are hard to change once they're released because any breaking change makes existing monitoring setups useless. So I want to use your patch as a stepping stone to get rid of the copied classes. It will take multiple focused Jira issues to go all the way.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Yeah, we want to be able to hook different reporters – but as I said before, I do not want to rush into a solution without giving SOLR-4735 a proper thought. These kind of things are hard to change once they're released because any breaking change makes existing monitoring setups useless. So I want to use your patch as a stepping stone to get rid of the copied classes. It will take multiple focused Jira issues to go all the way.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          FInal patch. Removed an unused import. Fixed TimerUtilsTest to test all metrics added by TimerUtils except for "avgRequestsPerMinute" which can't be tested because it changes with elapsed time.

          I'll commit this now.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - FInal patch. Removed an unused import. Fixed TimerUtilsTest to test all metrics added by TimerUtils except for "avgRequestsPerMinute" which can't be tested because it changes with elapsed time. I'll commit this now.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ff6da66601ca454941f6e3f0957068f5269319a6 in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff6da66 ]

          SOLR-8785: Use Dropwizard Metrics library for core metrics

          Show
          jira-bot ASF subversion and git services added a comment - Commit ff6da66601ca454941f6e3f0957068f5269319a6 in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff6da66 ] SOLR-8785 : Use Dropwizard Metrics library for core metrics
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2deb900774b2ec56ea54be3c4209f8ad4b83dc99 in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2deb900 ]

          SOLR-8785: Use Dropwizard Metrics library for core metrics

          (cherry picked from commit ff6da66)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2deb900774b2ec56ea54be3c4209f8ad4b83dc99 in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2deb900 ] SOLR-8785 : Use Dropwizard Metrics library for core metrics (cherry picked from commit ff6da66)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 87dc02e3c49eb4b09c12a798790a0475417ff19c in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=87dc02e ]

          SOLR-8785: tweak attribution

          Show
          jira-bot ASF subversion and git services added a comment - Commit 87dc02e3c49eb4b09c12a798790a0475417ff19c in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=87dc02e ] SOLR-8785 : tweak attribution
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e55742e056661c6e6f6e7a751b2a6e67a5ff1f96 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e55742e ]

          SOLR-8785: tweak attribution

          Show
          jira-bot ASF subversion and git services added a comment - Commit e55742e056661c6e6f6e7a751b2a6e67a5ff1f96 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e55742e ] SOLR-8785 : tweak attribution
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Thanks Jeff, Kelvin and Christine!

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Thanks Jeff, Kelvin and Christine!
          Hide
          kwong494 Kelvin Wong added a comment - - edited

          Thanks Shalin, Jeff, and Christine!

          One problem I noticed is that Codahale Timers report per-second rates by default. In the patch above, we've wrongly named these rates as per-minute rates in TimerUtils.java. Because we cannot configure the Timer to report on a per-minute basis, I propose we standardize the rates to be reported on a per-second basis. Otherwise, we would need a utility function to do the conversion.

          Please see below for a potential fix. What this means now is that:

          • Timer rates are reported in a standard per-second basis. This changes existing behaviour in OverseerStatusCmd.java which used to report in a per-minute basis.
          • In RequestHandlerBase.java and AnalyticsStatisticsCollector.java, the naming of the rates metrics is changed. For example, from '5minRateReqsPerSecond' to '5minRateRequestsPerSecond'. This keeps the naming consistent with the 'avgRequestsPerSecond'.

          solr/core/src/java/org/apache/solr/util/stats/TimerUtils.java

          - lst.add("avgRequestsPerMinute", timer.getMeanRate());
          - lst.add("5minRateRequestsPerMinute", timer.getFiveMinuteRate());
          - lst.add("15minRateRequestsPerMinute", timer.getFifteenMinuteRate());
          + lst.add("avgRequestsPerSecond", timer.getMeanRate());
          + lst.add("5minRateRequestsPerSecond", timer.getFiveMinuteRate());
          + lst.add("15minRateRequestsPerSecond", timer.getFifteenMinuteRate());
          

          solr/core/src/test/org/apache/solr/util/stats/TimerUtilsTest.java

          - // cannot test avgRequestsPerMinute directly because mean rate changes as time increases!
          - // assertEquals(lst.get("avgRequestsPerMinute"), timer.getMeanRate());
          - assertEquals(lst.get("5minRateRequestsPerMinute"), timer.getFiveMinuteRate());
          - assertEquals(lst.get("15minRateRequestsPerMinute"), timer.getFifteenMinuteRate());
          + // cannot test avgRequestsPerSecond directly because mean rate changes as time increases!
          + // assertEquals(lst.get("avgRequestsPerSecond"), timer.getMeanRate());
          + assertEquals(lst.get("5minRateRequestsPerSecond"), timer.getFiveMinuteRate());
          + assertEquals(lst.get("15minRateRequestsPerSecond"), timer.getFifteenMinuteRate());
          
          Show
          kwong494 Kelvin Wong added a comment - - edited Thanks Shalin, Jeff, and Christine! One problem I noticed is that Codahale Timers report per-second rates by default. In the patch above, we've wrongly named these rates as per-minute rates in TimerUtils.java. Because we cannot configure the Timer to report on a per-minute basis, I propose we standardize the rates to be reported on a per-second basis. Otherwise, we would need a utility function to do the conversion. Please see below for a potential fix. What this means now is that: Timer rates are reported in a standard per-second basis. This changes existing behaviour in OverseerStatusCmd.java which used to report in a per-minute basis. In RequestHandlerBase.java and AnalyticsStatisticsCollector.java, the naming of the rates metrics is changed. For example, from '5minRateReqsPerSecond' to '5minRateRequestsPerSecond'. This keeps the naming consistent with the 'avgRequestsPerSecond'. solr/core/src/java/org/apache/solr/util/stats/TimerUtils.java - lst.add( "avgRequestsPerMinute" , timer.getMeanRate()); - lst.add( "5minRateRequestsPerMinute" , timer.getFiveMinuteRate()); - lst.add( "15minRateRequestsPerMinute" , timer.getFifteenMinuteRate()); + lst.add( "avgRequestsPerSecond" , timer.getMeanRate()); + lst.add( "5minRateRequestsPerSecond" , timer.getFiveMinuteRate()); + lst.add( "15minRateRequestsPerSecond" , timer.getFifteenMinuteRate()); solr/core/src/test/org/apache/solr/util/stats/TimerUtilsTest.java - // cannot test avgRequestsPerMinute directly because mean rate changes as time increases! - // assertEquals(lst.get( "avgRequestsPerMinute" ), timer.getMeanRate()); - assertEquals(lst.get( "5minRateRequestsPerMinute" ), timer.getFiveMinuteRate()); - assertEquals(lst.get( "15minRateRequestsPerMinute" ), timer.getFifteenMinuteRate()); + // cannot test avgRequestsPerSecond directly because mean rate changes as time increases! + // assertEquals(lst.get( "avgRequestsPerSecond" ), timer.getMeanRate()); + assertEquals(lst.get( "5minRateRequestsPerSecond" ), timer.getFiveMinuteRate()); + assertEquals(lst.get( "15minRateRequestsPerSecond" ), timer.getFifteenMinuteRate());
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Thanks Kelvin, I'll fix.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Thanks Kelvin, I'll fix.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          I think we should keep our per minute rates because this is a back-compat break for everyone who uses these values. This isn't just limited to Overseer Status; the handler statistics also return these values.

          This patch adds a helper function in TimerUtils called convertRateToPerMinute which performs the conversion.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - I think we should keep our per minute rates because this is a back-compat break for everyone who uses these values. This isn't just limited to Overseer Status; the handler statistics also return these values. This patch adds a helper function in TimerUtils called convertRateToPerMinute which performs the conversion.
          Hide
          kwong494 Kelvin Wong added a comment -

          I think the RequestHandlerBase and AnalyticsStatisticsCollector previously used per-second rates; only the OverseerStatusCmd uses per-minute rates. Standardizing all rates to per-minute will break backwards compatibility with those classes. Apologies for not being clearer earlier.

          Show
          kwong494 Kelvin Wong added a comment - I think the RequestHandlerBase and AnalyticsStatisticsCollector previously used per-second rates; only the OverseerStatusCmd uses per-minute rates. Standardizing all rates to per-minute will break backwards compatibility with those classes. Apologies for not being clearer earlier.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit dab2e2465697f2318c9d02c7e423ca1fd0a7488b in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dab2e24 ]

          SOLR-8785: Convert rates to be per minute from the default per second rates reported by the metrics library

          Show
          jira-bot ASF subversion and git services added a comment - Commit dab2e2465697f2318c9d02c7e423ca1fd0a7488b in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dab2e24 ] SOLR-8785 : Convert rates to be per minute from the default per second rates reported by the metrics library
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Ah, I was too eager to commit. You are right, I overlooked that (again). In that case you are right and we should standardize on per-second rates everywhere. I'll fix.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Ah, I was too eager to commit. You are right, I overlooked that (again). In that case you are right and we should standardize on per-second rates everywhere. I'll fix.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f8fa2e998d094223702e12d7bd8a84985859a747 in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f8fa2e9 ]

          SOLR-8785: Use per-second rates for consistency in all stats outputs

          Show
          jira-bot ASF subversion and git services added a comment - Commit f8fa2e998d094223702e12d7bd8a84985859a747 in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f8fa2e9 ] SOLR-8785 : Use per-second rates for consistency in all stats outputs
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit df242c8437351112e11b9de441ae86f065919878 in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df242c8 ]

          SOLR-8785: Convert rates to be per minute from the default per second rates reported by the metrics library

          (cherry picked from commit dab2e24)

          Show
          jira-bot ASF subversion and git services added a comment - Commit df242c8437351112e11b9de441ae86f065919878 in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df242c8 ] SOLR-8785 : Convert rates to be per minute from the default per second rates reported by the metrics library (cherry picked from commit dab2e24)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 8a5bb46663be7bece756f83e1005821ded337647 in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8a5bb46 ]

          SOLR-8785: Use per-second rates for consistency in all stats outputs

          (cherry picked from commit f8fa2e9)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 8a5bb46663be7bece756f83e1005821ded337647 in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8a5bb46 ] SOLR-8785 : Use per-second rates for consistency in all stats outputs (cherry picked from commit f8fa2e9)
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Thanks for the review, Kelvin!

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Thanks for the review, Kelvin!
          Hide
          kwong494 Kelvin Wong added a comment -

          No problem! Great to see this resolved.

          Show
          kwong494 Kelvin Wong added a comment - No problem! Great to see this resolved.

            People

            • Assignee:
              shalinmangar Shalin Shekhar Mangar
              Reporter:
              jwartes Jeff Wartes
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development