Details
Description
I have encountered several issues in the past where heartbeat requests are not sent [1,2] (either in time, or ever), and today it is a bit hard to get to that from the logs. I think it is better to add a metric as "last-heartbeat-seconds-ago" where when rebalances were triggered we can immediately find out if this is the root cause.
1. https://issues.apache.org/jira/browse/KAFKA-10793
2. https://issues.apache.org/jira/browse/KAFKA-10827
Attachments
Issue Links
- Is contained by
-
KAFKA-12352 Improve debuggability with continuous consumer rebalances
- Open