Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
Description
When a consumer unsubscribes, the app thread simply triggers an Unsubscribe event that will take care of it all in the background thread: release assignment (callbacks), clear assigned partitions, and send leave group HB.
On the contrary, when a consumer is closed, these actions happen in both threads:
- release assignment -> in the app thread by directly running the callbacks
- clear assignment -> in app thread by updating the subscriptionState
- send leave group HB -> in the background thread via an event LeaveOnClose
This situation could lead to race conditions, mainly because of the close updating the subscription state in the app thread, when other operations in the background could be already running based on it. Ex.
- unsubscribe in app thread (triggers background UnsubscribeEvent to revoke and leave)
- unsubscribe fails (ex. interrupted, leaving operation running in the background thread to revoke partitions and leave)
- consumer close (will revoke and clear assignment in the app thread)
- UnsubscribeEvent in the background may fail by trying to revoke partitions that it does not own anymore - No current assignment for partition ...
A basic check has been added to the background thread revocation to avoid the race condition, ensuring that we only revoke partitions we own, but still we should avoid the root cause, which is updating the assignment on the app thread. We should consider having the close operation as a single LeaveOnClose event handled in the background. That even already takes cares of revoking the partitions and clearing assignment on the background, so no need to take care of it in the app thread. We should only ensure that we processBackgroundEvents until the LeaveOnClose completes (to allow for callbacks to run in the app thread)
Trying to understand the current approach, I imagine the initial motivation to have the callabacks (and assignment cleared) in the app thread was to avoid the back-and-forth: app thread close -> background thread leave event -> app thread to run callback -> background thread to clear assignment and send HB. But updating the assignment on the app thread ends up being problematic, as it mainly happens in the background so it opens up the door for race conditions on the subscription state.
Attachments
Issue Links
- fixes
-
KAFKA-16022 AsyncKafkaConsumer sometimes complains “No current assignment for partition {}”
- Closed
- is related to
-
KAFKA-16290 Investigate propagating subscription state updates via queues
- Resolved
- links to