Details
Description
We have seen recently multiple consumer groups stuck in `CompletingRebalance`. It appears that those group never receives the assignment from the leader of the group and remains stuck in this state forever.
When a group transitions to the `CompletingRebalance` state, the group coordinator sets up `DelayedHeartbeat` for each member of the group. It does so to ensure that the member sends a sync request within the session timeout. If it does not, the group coordinator rebalances the group. Note that here, `DelayedHeartbeat` is used here for this purpose. `DelayedHeartbeat` are also completed when member heartbeats.
The issue is that https://github.com/apache/kafka/pull/8834 has changed the heartbeat logic to allow members to heartbeat while the group is in the `CompletingRebalance` state. This was not allowed before. Now, if a member starts to heartbeat while the group is in the `CompletingRebalance`, the heartbeat request will basically complete the pending `DelayedHeartbeat` that was setup previously for catching not receiving the sync request. Therefore, if the sync request never comes, the group coordinator does not notice anymore.
We need to bring that behavior back somehow.