Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-22929

Change the default failover strategy to FixDelayRestartStrategy for batch jobs

    XMLWordPrintableJSON

    Details

      Description

      Currently for the default failover strategy:

      1. Stream Job without checkpoint: NoRestartStrategy
      2. Stream Job with checkpoint:  FixDelayRestartStrategy as configured  in this method
      3. Batch Job: NoRestartStrategy

       

      The default failover strategy is reasonable for the stream jobs since without checkpoint, the stream job could not restart without paying high costs. However, for batch jobs, the failover is handled via persisted intermediate result partitions, and users usually expect the batch job could finish normally by default (similar to other batch processing system). Thus it seems to be more reasonable to make the default failover strategy for the batch jobs to be the same the stream job with checkpoint enabled (namely FixDelayRestartStrategy).

       

      Some users are also report the related issues.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gaoyunhaii Yun Gao
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: