Description
In the current metrics RemoteReadErrorsPerSec, the exception type OffsetOutOfRangeException is not included.
In our testing with tiered storage feature (at Apple), we noticed several cases where remote download is affected and stuck due to repeatedly OffsetOutOfRangeException in some particular broker or topic partitions. The root cause could be various but currently without a metrics it's very hard to catch this issue and debug in a timely fashion. It's understandable that the exception itself could not be the root cause but this exception metric could be a good metrics for us to alert and investigate.
Related discussion
https://github.com/apache/kafka/pull/13944#discussion_r1266243006
I am happy to contribute to this if the request is agreed.
Attachments
Issue Links
- is a child of
-
KAFKA-16947 Kafka Tiered Storage V2
- Open