Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5063

State handles are not properly cleaned up for declined or expired checkpoints

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.1.3, 1.2.0
    • 1.1.4, 1.2.0
    • None

    Description

      In case that a Checkpoint is declined or expires, the CheckpointCoordinator will dispose the PendingCheckpoint. Disposing the PendingCheckpoint entails that all so far registered SubtaskStates of the acknowledged Tasks are discarded. However, all late arriving acknowledge messages are simply ignored without properly discarding the transmitted state handles. This can lead to a cluttering of checkpoint directory since the checkpoint files of late or unknown acknowledge checkpoint messages are never deleted.

      I propose to properly discard the state handles at the CheckpointCoordinator if receiving a late acknowledge message or an acknowledge message for an unknown ExecutionAttemptID belonging to the job of the CheckpointCoordinator. However, checkpoint messages belonging to a different job won't be handled and simply ignored.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            trohrmann Till Rohrmann
            trohrmann Till Rohrmann
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment