Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
Added the option sink.committer.retries to specify the number of retries a Flink application attempts for committable operations (such as transactions) on retriable errors, as specified by the sink connector, before Flink fails and potentially restarts.
Description
Currently, we retry committables at some time later until they eventually succeed.
However, that violates the contract of notifyCheckpointCompleted which states that all side effect must be committed before returning the method. In particular, notifyCheckpointCompleted must fail if we cannot guarantee that all side effects are committed for final checkpoints. As soon as notifyCheckpointCompleted returns, the final checkpoint is deemed completed, which currently may mean that some transactions are still open.
The solution is that all retries must happen in a close loop in notifyCheckpointCompleted.