Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.1.3, 1.2.0
-
None
Description
In case that a Checkpoint is declined or expires, the CheckpointCoordinator will dispose the PendingCheckpoint. Disposing the PendingCheckpoint entails that all so far registered SubtaskStates of the acknowledged Tasks are discarded. However, all late arriving acknowledge messages are simply ignored without properly discarding the transmitted state handles. This can lead to a cluttering of checkpoint directory since the checkpoint files of late or unknown acknowledge checkpoint messages are never deleted.
I propose to properly discard the state handles at the CheckpointCoordinator if receiving a late acknowledge message or an acknowledge message for an unknown ExecutionAttemptID belonging to the job of the CheckpointCoordinator. However, checkpoint messages belonging to a different job won't be handled and simply ignored.