Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.14.0
Description
With ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH enabled, final checkpoint can deadlock (or timeout after very long time) if there is a race condition between selecting tasks to trigger checkpoint on and finishing tasks. FLINK-21246 was supposed to handle it, but it doesn't work as expected, because futures from:
org.apache.flink.runtime.taskexecutor.TaskExecutor#triggerCheckpoint
and
org.apache.flink.streaming.runtime.tasks.StreamTask#triggerCheckpointAsync
are not linked together. TaskExecutor#triggerCheckpoint reports that checkpoint has been successfully triggered, while StreamTask might have actually finished.
Attachments
Issue Links
- is related to
-
FLINK-21246 Decline Checkpoint if some tasks finished before get triggered
- Closed
- links to
Merged to master as cc417cd0636 and edaf75ee072