Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
1.11.3, 1.13.1, 1.12.4
-
None
-
None
Description
We should trigger a global job failover in case of a stop-with-savepoint --drain fails.
The situation is obvious in case of the with drain mode. If a savepoint fails we simply can not continue as we have already flushed all data and prepared the state for finishing. We can not simply continue processing records.
It is more debatable for without drain mode, where we could theoretically continue processing records, however, it is also a good approach to unify the two modes.
This task is about triggering the failover on the CheckpointCoordinator. We should make sure that if a synchronous checkpoint has been triggered there will be no newere checkpoints scheduled.
If a synchronous savepoint fails for whatever reason we should trigger a global failover for the job.
We might add a safety guards (checkState calls for situations we missed on the Task in a follow-up ticket)
Attachments
Issue Links
- is related to
-
FLINK-23606 Add safety guards in StreamTask(s) if a global failover for a synchronous savepoint should've happen
- Closed