Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
`StreamsPartitionAssignor` is responsible to assign partitions and tasks to all StreamsThreads within an application.
While it ensures to not assign a single partition/task to two threads, there is limited verification about it. In particular, we had one incident for with a zombie thread/consumer did not cleanup its own internal state correctly due to KAFKA-12983. This unclean zombie-state implied that the old assignment reported to `StreamsPartitionAssignor` contained a single partition for two consumers. As a result, both threads/consumers later revoked the same partition and the zombie-thread could commit it's unclean work (even if it should have been fenced), leading to duplicate output under EOS_v2.
We should consider to add a check to `StreamsPartitionAssignor` if the old assignment is valid, ie, no partition should be missing and no partition should be assigned to two consumers. For this case, we should log the invalid old assignment and send an error code back to all consumer that indicates that they should shut down "unclean" (ie, without and flushing and no committing any offsets or transactions).
Attachments
Issue Links
- links to