Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Event 4.2.24
-
None
Description
we use the prometheus exporter to export Sling Metrics / Dropwizard metrics, and we often see messages like this:
10.03.2022 08:50:15.333 [...] *WARN* [qtp568481508-1779] io.prometheus.client.dropwizard.DropwizardExports Gauge has been blacklisted for 300000 ms due timeout: Generated from Dropwizard metric import (metric=sling_event.jobs.cancelled.count, type=org.apache.sling.event.impl.jobs.stats.GaugeSupport$2)
This means that calculating the metric took too long. We should make sure that the calculation is done asnychronously and just pre-computed values are returned.
For at least these values the handling needs to be improved:
- sling_event.jobs.active.count
- sling_event.jobs.averageProcessingTime
- sling_event.jobs.averageWaitingTime
- sling_event.jobs.cancelled.count