Details
-
Improvement
-
Status: Reopened
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
If we pass `KafkaServer.time` to `GroupCoordinator`[1], some streams tests like QueryableStateIntegrationTest fail sem-regularly. damianguy looked into it and described it as:
Looking at the sequence of events, one thread is stopped, and hence leaves the group triggering a rebalance, but the other thread doesn’t seem to get the memo, tries to commit, fails, and then game-over.
So.. the case that it fails the one alive thread is not getting a rebalance. This would happen during a `poll(..)` right? However i can see the thread is polling many times after the other thread has shutdown.
It tries to commit every time around the loop, so:
poll(..)
process(..)
maybeCommit(..)and there is like < 10ms between calls to `poll`.
A theory was that the mock time was not advancing enough to trigger a rebalance in the group coordinator. However, the consumer is closed, so that should trigger a `LeaveGroup` request and it's unclear why a rebalance is not triggered for the live consumer.
PR where this issue was first seen and discussed: https://github.com/apache/kafka/pull/2095
[1] https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaServer.scala#L222