Details
Description
The KafkaConsumer is a complex client that requires many different components to function properly. When a consumer is not operating properly, it can be difficult to identify the root cause and which component is causing issues (ConsumerCoordinator, Fetcher, ConsumerNetworkClient, etc).
This aims to improve the monitoring and detection of KafkaConsumer’s Fetcher component.
Fetcher will send a fetch request for each node that the consumer has assigned partitions for.
This fetch request may fail under the following cases:
- Intermittent network issues (goes to onFailure)
- Node sent an invalid full/incremental fetch response (FetchSessionHandler’s handleResponse returns false)
- FetchSessionIdNotFound
- InvalidFetchSessionEpochException
These cases are logged, but it would be valuable to provide a corresponding metric that allows for monitoring and alerting.