Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Won't Do
-
None
-
None
Description
The Checkpoint coordinator should track the number of consecutive unsuccessful checkpoints.
If more than n (configured value) checkpoints fail in a row, it should call fail() on the execution graph to trigger a recovery.
The design document is here : https://docs.google.com/document/d/1ce7RtecuTxcVUJlnU44hzcO2Dwq9g4Oyd8_biy94hJc/edit?usp=sharing
Attachments
Issue Links
- depends upon
-
FLINK-10724 Refactor failure handling in check point coordinator
- Closed
- is duplicated by
-
FLINK-10074 Allowable number of checkpoint failures
- Closed
-
FLINK-12364 Introduce a CheckpointFailureManager to centralized manage checkpoint failure
- Closed
- links to