2 new metrics should be defined:
Failed Initiator cycles
Failed Cleaner cycles
They should be measured as part of the error handling in the services, the lock timeout on AUX lock, should be ignored.
These should be RatioGauges (fail / success)
A RatioGauge implementation is available in the metrics package in common, a similar one should be created in the metastore. The common is build on top of MetricsVariable interface, where someone provides the metric from outside, in the metastore it should be done like the Gauge implementation, where the metrics class handles the AtomicIntegers