Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13921

Simplify cluster level RestartStrategy configuration

    XMLWordPrintableJSON

Details

    • Hide
      Flink's cluster level restart strategy configuration has been simplified to no longer override the default restart strategy if `restart-strategy.fixed-delay.attempts` or `restart-strategy.fixed-delay.delay` are being set. Please checkout https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html for more information.
      Show
      Flink's cluster level restart strategy configuration has been simplified to no longer override the default restart strategy if `restart-strategy.fixed-delay.attempts` or `restart-strategy.fixed-delay.delay` are being set. Please checkout https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html for more information.

    Description

      Currently, Flink's behaviour with respect to configuring the RestartStrategies is quite complicated and convoluted. The reason for this is that we evolved the way it has been configured and wanted to keep it backwards compatible. Due to this, we have currently the following behaviour:

      • If the config option restart-strategy is configured, then Flink uses this RestartStrategy (so far so simple)
      • If the config option restart-strategy is not configured, then
        • If restart-strategy.fixed-delay.attempts or restart-strategy.fixed-delay.delay are defined, then instantiate FixedDelayRestartStrategy(restart-strategy.fixed-delay.attempts, restart-strategy.fixed-delay.delay)
        • If restart-strategy.fixed-delay.attempts and restart-strategy.fixed-delay.delay are not defined, then
          • If checkpointing is disabled, then choose NoRestartStrategy
          • If checkpointing is enabled, then choose FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")

      I would like to simplify the configuration by removing the "If restart-strategy.fixed-delay.attempts or restart-strategy.fixed-delay.delay, then" condition. That way, the logic would be the following:

      • If the config option restart-strategy is configured, then Flink uses this RestartStrategy
      • If the config option restart-strategy is not configured, then
        • If checkpointing is disabled, then choose NoRestartStrategy
        • If checkpointing is enabled, then choose FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")

      That way we retain the user friendliness that their jobs restart if they enable checkpointing and we make it clear that any restart-strategy.fixed-delay setting will only be respected if restart-strategy has been set to fixed-delay.

      This simplification would, however, change Flink's behaviour and might break existing setups. Since we introduced RestartStrategies with Flink 1.0.0 and deprecated the prior configuration mechanism which enables restarting if either the attempts or the delay has been set, I think that the number of broken jobs should be minimal if not non-existent.

      Attachments

        Issue Links

          Activity

            People

              trohrmann Till Rohrmann
              trohrmann Till Rohrmann
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m