Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-22494

Avoid discarding checkpoints in case of failure

    XMLWordPrintableJSON

Details

    Description

      Both StateHandleStore implementations (i.e. KubernetesStateHandleStore:157 and ZooKeeperStateHandleStore:170) discard checkpoints if the checkpoint metadata wasn't written to the backend.

      This does not cover the cases where the data was actually written to the backend but the call failed anyway (e.g. due to network issues). In such a case, we might end up having a pointer in the backend pointing to a checkpoint that was discarded.

      Instead of discarding the checkpoint data in this case, we might want to keep it for this specific use case. Otherwise, we might run into Exceptions when recovering from the Checkpoint later on. We might want to add a warning to the user pointing to the possibly orphaned checkpoint data.

      Attachments

        Issue Links

          Activity

            People

              mapohl Matthias Pohl
              mapohl Matthias Pohl
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: