Description
State stores lose state when tasks are reassigned under EOS with standby replicas and default acceptable lag.
I have observed that state stores used in a transform step under a Exactly Once semantics ends up losing state after a rebalancing event that includes reassignment of tasks to previous standby task within the acceptable standby lag.
The problem is reproduceable and an integration test have been created to showcase the issue.
A detailed description of the observed issue is provided here
Similar issues have been observed and reported to StackOverflow for example here.