Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4815 Automatic fallback to earlier checkpoints when checkpoint restore fails
  3. FLINK-7783

Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.3.2, 1.4.0
    • 1.3.3, 1.4.0
    • None

    Description

      Currently, we always delete checkpoint handles if they (or the data from the DFS) cannot be read: https://github.com/apache/flink/blob/91a4b276171afb760bfff9ccf30593e648e91dfb/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L180

      This can lead to problems in case the DFS is temporarily not available, i.e. we could inadvertently
      delete all checkpoints even though they are still valid.

      A user reported this problem on the mailing list: https://lists.apache.org/thread.html/9dc9b719cf8449067ad01114fedb75d1beac7b4dff171acdcc24903d@%3Cuser.flink.apache.org%3E

      Attachments

        Issue Links

          Activity

            People

              aljoscha Aljoscha Krettek
              aljoscha Aljoscha Krettek
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: