Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9232

Coordinator new member heartbeat completion does not work for JoinGroup v3

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.1, 2.2.2, 2.4.0, 2.3.1
    • 2.1.2, 2.2.3, 2.3.2, 2.4.1
    • None
    • None

    Description

      For older versions of the JoinGroup API, the coordinator implements a static timeout for new members of 5 minutes. This timeout is implemented using the heartbeat purgatory and we expect that the delayed operation will be force completed if the member successfully joins. This is implemented in GroupCoordinator with the following logic:

                  group.maybeInvokeJoinCallback(member, joinResult)
                  completeAndScheduleNextHeartbeatExpiration(group, member)
                  member.isNew = false
      

      However, heartbeat completion depends on this check:

        def shouldKeepAlive(deadlineMs: Long): Boolean = {
          if (isAwaitingJoin)
            !isNew || latestHeartbeat + GroupCoordinator.NewMemberJoinTimeoutMs > deadlineMs
          else awaitingSyncCallback != null ||
            latestHeartbeat + sessionTimeoutMs > deadlineMs
        }
      

      Since we invoke the join callback first, we will fall to the second branch. This will only return true when the latest heartbeat plus session timeout exceeds the deadline. The deadline in this case depends only on the statically configured new member timeout, which means the heartbeat cannot complete until about 5 minutes have passed. If the member falls out of the group before then, then the heartbeat ultimately expires, which may trigger a spurious rebalance.

      Newer versions of the protocol are not affected by this bug because we return immediately the first time a member joins the group.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ableegoldman A. Sophie Blee-Goldman
            hachikuji Jason Gustafson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment