Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12890

Consumer group stuck in `CompletingRebalance`

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.7.0, 2.6.1, 2.8.0, 2.7.1, 2.6.2
    • 2.8.1, 3.0.0
    • None
    • None

    Description

      We have seen recently multiple consumer groups stuck in `CompletingRebalance`. It appears that those group never receives the assignment from the leader of the group and remains stuck in this state forever.

      When a group transitions to the `CompletingRebalance` state, the group coordinator sets up `DelayedHeartbeat` for each member of the group. It does so to ensure that the member sends a sync request within the session timeout. If it does not, the group coordinator rebalances the group. Note that here, `DelayedHeartbeat` is used here for this purpose. `DelayedHeartbeat` are also completed when member heartbeats.

      The issue is that https://github.com/apache/kafka/pull/8834 has changed the heartbeat logic to allow members to heartbeat while the group is in the `CompletingRebalance` state. This was not allowed before. Now, if a member starts to heartbeat while the group is in the `CompletingRebalance`, the heartbeat request will basically complete the pending `DelayedHeartbeat` that was setup previously for catching not receiving the sync request. Therefore, if the sync request never comes, the group coordinator does not notice anymore.

      We need to bring that behavior back somehow.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dajac David Jacot
            dajac David Jacot
            Jason Gustafson Jason Gustafson
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment