Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Won't Do
-
None
-
None
-
None
Description
The restart strategies can use the exception information from FLINK-4816 to keep track of how often a checkpoint restore has failed. After a certain number of consecutive failures, they should take earlier completed checkpoints as recovery points.
It is up to discussion whether the restart strategies are the right place to implement that, or whether this is an orthogonal feature that should go into the checkpoint coordinator (which knows how many checkpoints are available) or a separate class altogether.