Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-26606

CompletedCheckpoints that failed to be discarded are not stored in the CompletedCheckpointStore

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      We introduced a repeatable per-job cleanup after the job reached a globally-terminated state. It also tries to clean up the CompletedCheckpointStore. But we missed one code path where CompletedCheckpoints are tried to be discarded in the CheckpointsCleaner. The CompletedCheckpointStore does not hold any references to these CompletedCheckpoints anymore. The shutdown at the end is not able to clean these checkpoints up.

      We should not remove the CompletedCheckpoints from the CompletedCheckpointStore if the deletion failed. This would enable us to retry deleting these artifacts at the end of the job and consider them in the retryable cleanup as well.

      The documentation was updated to cover this issue. Fixing this issue should also include removing the corresponding paragraph from the documentation (see related flink-docs PR).

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            mapohl Matthias Pohl

            Dates

              Created:
              Updated:

              Slack

                Issue deployment