Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2491 Support Checkpoints After Tasks Finished
  3. FLINK-23553

Trigger global failover for synchronous savepoints on CheckpointCoordinator

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.11.3, 1.13.1, 1.12.4
    • Fix Version/s: None
    • Labels:
      None

      Description

      We should trigger a global job failover in case of a stop-with-savepoint --drain fails.

      The situation is obvious in case of the with drain mode. If a savepoint fails we simply can not continue as we have already flushed all data and prepared the state for finishing. We can not simply continue processing records.

      It is more debatable for without drain mode, where we could theoretically continue processing records, however, it is also a good approach to unify the two modes.

      This task is about triggering the failover on the CheckpointCoordinator. We should make sure that if a synchronous checkpoint has been triggered there will be no newere checkpoints scheduled.

      If a synchronous savepoint fails for whatever reason we should trigger a global failover for the job.

      We might add a safety guards (checkState calls for situations we missed on the Task in a follow-up ticket)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                dwysakowicz Dawid Wysakowicz
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: