Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13060

FailoverStrategies should respect restart constraints

    XMLWordPrintableJSON

Details

    • Hide
      Users that have enabled the "region" failover strategy, along with a restart strategy that enforces a certain number of restarts or introduces a restart delay, will see changes in behavior. This failover strategy now respects constraints that are defined by the restart strategy.
      Show
      Users that have enabled the "region" failover strategy, along with a restart strategy that enforces a certain number of restarts or introduces a restart delay, will see changes in behavior. This failover strategy now respects constraints that are defined by the restart strategy.

    Description

      RestartStrategies can define their own restrictions for whether job can be restarted or not. For example, they could count the number of total failures or observe failure rates.

      FailoverStrategies are used for partial restarts of jobs, and currently largely bypass the restrictions defined by the restart strategies.

      My proposal is the following:

      Introduce a new method into the RestartStrategy interface to notify the strategy of failed task executions. Currently, strategies implicitly handle this in RestartStrategy#restart, as such the migration of our existing strategies should be trivial.

      Next, before calling RestartStrategy#restart, inform the strategy about the task failure. This retains existing behavior.
      Additionally, the FailoverStrategy implementation may additionally inform the restart strategy about task failures, if and when they perform a local failover. Additionally, all implementation have to check RestartStrategy#canRestart before attempting a failover.

      Attachments

        Issue Links

          Activity

            People

              chesnay Chesnay Schepler
              chesnay Chesnay Schepler
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m