Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Background
The number of cleaner threads (responsible for cleaning up/compacting topics which contains "compact") is configured using https://kafka.apache.org/documentation.html#brokerconfigs_log.cleaner.threads
Problem
When the number of threads is in-adequate to handle the compaction load, the user will notice an increase in `max-compaction-delay-secs` metric. However, an increase in this metric does not necessarily mean that the threads are overloaded. For example, this metric could be increasing due to all cleaner threads getting throttled.
Requirement
We want a mechanism to determine when the cleaner threads should be increased.
Proposal
Add a thread pool utilization metric for log cleaner thread busy percentage. This is similar to how to we have thread pool utilization metrics for io-threads, network-threads etc. When the metric is emitted, the metric will emit the number of threads which are actively doing some work ie are not sleeping at https://github.com/apache/kafka/blob/4aee33d6ab1345243e426e05388f6fc512970e93/core/src/main/scala/kafka/log/LogCleaner.scala#L387
Note that this Jira required a KIP since we are adding a new metric