Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.4.1
-
None
-
None
-
Kafka 2.4.1 on jre 11 on debian 9 in docker
Description
Since upgrade of a cluster from 1.1.0 to 2.4.1 the broker no longer deals correctly with a consumer sending a join after a leave correctly.
What happens no is that if a consumer sends a leaving then follows up by trying to send a join again as it is shutting down the group coordinator adds the leaving member to the group but never seems to heartbeat that member.
Since the consumer is then gone when it joins again after starting it is added as a new member but the zombie member is there and is included in the partition assignment which means that those partitions never get consumed from. What can also happen is that one of the zombies gets group leader so rebalance gets stuck forever and the group is entirely blocked.
I have not been able to track down where this got introduced between 1.1.0 and 2.4.1 but I will look further into this. Unfortunately the logs are essentially silent about the zombie mebers and I only had INFO level logging on during the issue and by stopping all the consumers in the group and restarting the broker coordinating that group we could get back to a working state.
Attachments
Issue Links
- duplicates
-
KAFKA-9752 Consumer rebalance can be stuck after new member timeout with old JoinGroup version
- Resolved
- is a clone of
-
KAFKA-9935 Kafka not releasing member from Consumer Group
- Resolved