State stores lose state when tasks are reassigned under EOS with standby replicas and default acceptable lag.

I have observed that state stores used in a transform step under a Exactly Once semantics ends up losing state after a rebalancing event that includes reassignment of tasks to previous standby task within the acceptable standby lag.

The problem is reproduceable and an integration test have been created to showcase the issue.

A detailed description of the observed issue is provided here

Similar issues have been observed and reported to StackOverflow for example here.

Attachments

Issue Links

links to

GitHub Pull Request #13369

GitHub Pull Request #13725

Activity

People

Assignee:: Guozhang Wang

Reporter:: Martin Hørslev

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 19/Aug/22 09:01

Updated:: 22/Jun/23 13:02

Resolved:: 25/Apr/23 13:03