Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
3.8.0
-
None
-
None
Description
In an application with multiple clients, each having multiple threads, when the app is started with an empty storage (without resetting the whole application), only a part of the clients are restoring the changelog topics.
Those non-restoring clients are also not able to shutdown gracefully.
Reproduction steps
> I'm putting all the actual details, while I'm going to make a project to reproduce it locally, and I'll link it inside this ticket.
- Having the app in a kubernetes environment, with multiple pods (5) so finally having 5 streams clients, and also enough data or poor cpu to have long restoration (enough to see the issue after 1 or 2 minutes)
- Already consumed input topics and be live (no lag on input or internal topics)
- then stop the app
- clear out the local storage
- finally restart and see that only 2 or 3 clients are restoring, the others consuming nothing
- Bonus: stop the clients, then the stuck clients should not close and should continue sending heartbeats and answering any rebalance assignment
Related slack discussion: https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1728296887560369