[KAFKA-13891] sync group failed with rebalanceInProgress error cause rebalance many rounds in coopeartive - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.5.0, 3.4.1
Component/s: clients
Labels:
None

Description

This issue was first found in KAFKA-13419

But the previous PR forgot to reset generation when sync group failed with rebalanceInProgress error. So the previous bug still exists and it may cause consumer to rebalance many rounds before final stable.

Here's the example (bold is added):

consumer A joined and synced group successfully with generation 1 ( with ownedPartition P1/P2 )
New rebalance started with generation 2, consumer A joined successfully, but somehow, consumer A doesn't send out sync group immediately
other consumer completed sync group successfully in generation 2, except consumer A.
After consumer A send out sync group, the new rebalance start, with generation 3. So consumer A got REBALANCE_IN_PROGRESS error with sync group response
When receiving REBALANCE_IN_PROGRESS, we re-join the group, with generation 3, with the assignment (ownedPartition) in generation 1.
So, now, we have out-of-date ownedPartition sent, with unexpected results happened
After the generation-3 rebalance, consumer A got P3/P4 partition. the ownedPartition is ignored because of old generation.
consumer A revoke P1/P2 and re-join to start a new round of rebalance
if some other consumer C failed to syncGroup before consumer A's joinGroup. the same issue will happens again and result in many rounds of rebalance before stable

Attachments

Issue Links

links to

GitHub Pull Request #12140

GitHub Pull Request #12794

Github Pull Request #13550

Activity

People

Assignee:: Philip Nee

Reporter:: Shawn Wang

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 10/May/22 12:13

Updated:: 05/May/23 00:12

Resolved:: 03/May/23 16:58