Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
RestartStrategies can define their own restrictions for whether job can be restarted or not. For example, they could count the number of total failures or observe failure rates.
FailoverStrategies are used for partial restarts of jobs, and currently largely bypass the restrictions defined by the restart strategies.
My proposal is the following:
Introduce a new method into the RestartStrategy interface to notify the strategy of failed task executions. Currently, strategies implicitly handle this in RestartStrategy#restart, as such the migration of our existing strategies should be trivial.
Next, before calling RestartStrategy#restart, inform the strategy about the task failure. This retains existing behavior.
Additionally, the FailoverStrategy implementation may additionally inform the restart strategy about task failures, if and when they perform a local failover. Additionally, all implementation have to check RestartStrategy#canRestart before attempting a failover.
Attachments
Issue Links
- causes
-
FLINK-13452 Pipelined region failover strategy does not recover Job if checkpoint cannot be read
- Closed
- links to