[FLINK-26606] CompletedCheckpoints that failed to be discarded are not stored in the CompletedCheckpointStore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.15.0
Fix Version/s: None
Component/s: Runtime / Checkpointing, Runtime / Coordination
Labels:
None

Description

We introduced a repeatable per-job cleanup after the job reached a globally-terminated state. It also tries to clean up the CompletedCheckpointStore. But we missed one code path where CompletedCheckpoints are tried to be discarded in the CheckpointsCleaner. The CompletedCheckpointStore does not hold any references to these CompletedCheckpoints anymore. The shutdown at the end is not able to clean these checkpoints up.

We should not remove the CompletedCheckpoints from the CompletedCheckpointStore if the deletion failed. This would enable us to retry deleting these artifacts at the end of the job and consider them in the retryable cleanup as well.

The documentation was updated to cover this issue. Fixing this issue should also include removing the corresponding paragraph from the documentation (see related flink-docs PR).

Attachments

Issue Links

Discovered while testing

FLINK-26388 Release Testing: Repeatable Cleanup (FLINK-25433)

Resolved

is caused by

FLINK-25432 Introduce common interfaces for cleaning up local and global job data

Resolved

relates to

FLINK-26742 DefaultCompletedCheckpointStore.shutdown does not clean the checkpoints atomically

Closed

Sub-Tasks

1.	CompletedCheckpoint.DiscardObject.discard is not idempotent	Open	Wencong Liu
2.	FsCompletedCheckpointStorageLocation.disposeStorageLocation doesn't expose errors properly (not processing the return value)	Open	Unassigned
3.	IncrementalRemoteKeyedStateHandle.discardState swallows errors	Open	Unassigned
4.	OperatorSubtaskState swallows exception	Open	Unassigned
5.	SubtaskState#discardState swallows exceptions	Open	Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Matthias Pohl

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/Mar/22 10:40

Updated:: 03/Apr/23 07:02