Details
-
New Feature
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
1.1.2, 1.1.3
-
None
Description
Currently, if Flink cannot complete a checkpoint, it results in a failure and recovery.
To make the impact of less stable storage infrastructure on the performance of Flink less severe, Flink should be able to tolerate a certain number of failed checkpoints and simply keep executing.
This should be controllable via a parameter, for example:
env.getCheckpointConfig().setAllowedFailedCheckpoints(3);
A value of -1 could indicate an infinite number of checkpoint failures tolerated by Flink.
The default value should still be 0, to keep compatibility with the existing behavior.