Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.10.0.0, 0.10.0.1
-
None
-
None
Description
Kafka reports several metrics off the state of partitions:
UnderReplicatedPartitions
PreferredReplicaImbalanceCount
OfflinePartitionsCount
All of these metrics trigger when rapidly creating and deleting topics in a tight loop, although the actual causes of the metrics firing are from topics that are undergoing creation/deletion, and the cluster is otherwise stable.
Looking through the source code, topic deletion goes through an asynchronous state machine: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.
However, the metrics do not know about the progress of this state machine: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185
I believe the fix to this is relatively simple - we need to make the metrics know that a topic is currently undergoing deletion or creation, and only include topics that are "stable"
Attachments
Issue Links
- links to