[FLINK-13497] Checkpoints can complete after CheckpointFailureManager fails job - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.9.0, 1.10.0
Fix Version/s: 1.11.0
Component/s: Runtime / Checkpointing
Labels:
None

Description

I think that we introduced with ~~FLINK-12364~~ an inconsistency wrt to job termination a checkpointing. In ~~FLINK-9900~~ it was discovered that checkpoints can complete even after the CheckpointFailureManager decided to fail a job. I think the expected behaviour should be that we fail all pending checkpoints once the CheckpointFailureManager decides to fail the job.

Attachments

Issue Links

causes

FLINK-9900 Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles

Closed

is caused by

FLINK-13698 Rework threading model of CheckpointCoordinator

Reopened

FLINK-16945 Execute CheckpointFailureManager.FailJobCallback directly in main thread executor

Closed

is fixed by

FLINK-16945 Execute CheckpointFailureManager.FailJobCallback directly in main thread executor

Closed

is related to

FLINK-5960 Make CheckpointCoordinator less blocking

Closed

relates to

FLINK-13527 Instable KafkaProducerExactlyOnceITCase due to CheckpointFailureManager

Closed

(1 relates to)

Activity

People

Assignee:: Biao Liu

Reporter:: Till Rohrmann

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 30/Jul/19 14:34

Updated:: 08/Apr/20 10:21

Resolved:: 08/Apr/20 10:20