Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2491 Support Checkpoints After Tasks Finished
  3. FLINK-23553

Trigger global failover for synchronous savepoints on CheckpointCoordinator

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.11.3, 1.13.1, 1.12.4
    • None
    • None

    Description

      We should trigger a global job failover in case of a stop-with-savepoint --drain fails.

      The situation is obvious in case of the with drain mode. If a savepoint fails we simply can not continue as we have already flushed all data and prepared the state for finishing. We can not simply continue processing records.

      It is more debatable for without drain mode, where we could theoretically continue processing records, however, it is also a good approach to unify the two modes.

      This task is about triggering the failover on the CheckpointCoordinator. We should make sure that if a synchronous checkpoint has been triggered there will be no newere checkpoints scheduled.

      If a synchronous savepoint fails for whatever reason we should trigger a global failover for the job.

      We might add a safety guards (checkState calls for situations we missed on the Task in a follow-up ticket)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dwysakowicz Dawid Wysakowicz
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: