[FLINK-4818] RestartStrategy should track how many failed restore attempts the same checkpoint has and fall back to earlier checkpoints - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Won't Do
Affects Version/s: None
Fix Version/s: None
Component/s: Runtime / Coordination
Labels:
None

Description

The restart strategies can use the exception information from FLINK-4816 to keep track of how often a checkpoint restore has failed. After a certain number of consecutive failures, they should take earlier completed checkpoints as recovery points.

It is up to discussion whether the restart strategies are the right place to implement that, or whether this is an orthogonal feature that should go into the checkpoint coordinator (which knows how many checkpoints are available) or a separate class altogether.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Stephan Ewen

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 12/Oct/16 19:20

Updated:: 30/Aug/19 10:24

Resolved:: 30/Aug/19 10:24