Details
-
Bug
-
Status: Open
-
P1
-
Resolution: Unresolved
-
2.26.0, 2.27.0, 2.28.0, 2.29.0, 2.30.0, 2.31.0, 2.32.0, 2.33.0, 2.34.0, 2.35.0, 2.36.0
-
None
-
None
Description
A user noticed that we commit Kafka offsets without any obvious checkpointing. We use a Reshuffle.byRandomKey() to cause Dataflow and the SparkRunner to checkpoint. But on runners with non-checkpointing shuffle, this risks data loss.
The modern solution is to use @RequiresStableInput. This is not perfectly/fully implemented across many runners, so we still need the explicit shuffle for now.
https://stackoverflow.com/questions/70785672/apache-beam-kafkaio-commit-offset-behaviour