Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6027

Ignore the exception thrown by the subsuming of old completed checkpoints

    XMLWordPrintableJSON

Details

    Description

      When a checkpoint is added into the CompletedCheckpointStore via the method addCheckpoint(), the oldest checkpoints will be removed from the store if the number of stored checkpoints exceeds the given limit. The subsuming of old checkpoints may fail and make addCheckpoint() throw exceptions which are caught by CheckpointCoordinator. Finally, the states in the new checkpoint will be deleted by CheckpointCoordinator. Because the new checkpoint is still in the store, we may recover the job from the new checkpoint. But the recovery will fail as the states of the checkpoint are all deleted.

      We should ignore the exceptions thrown by the subsuming of old checkpoints because we can always recover from the new checkpoint when successfully adding it into the store. The ignorance may produce some dirty data, but it's acceptable because they can be cleaned with the cleanup hook introduced in the near future.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              shixg Xiaogang Shi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: