Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32347

Exceptions from the CompletedCheckpointStore are not registered by the CheckpointFailureManager

    XMLWordPrintableJSON

Details

    Description

      Currently if an error occurs while saving a completed checkpoint in the CompletedCheckpointStore, CheckpointCoordinator doesn't call CheckpointFailureManager to handle the error. Such behavior leads to the fact, that errors from CompletedCheckpointStore don't increase the failed checkpoints count and 'execution.checkpointing.tolerable-failed-checkpoints' option does not limit the number of errors of this kind in any way.

      Possible solution may be to move the notification of CheckpointFailureManager about successful checkpoint after storing completed checkpoint in the CompletedCheckpointStore and providing the exception to the CheckpointFailureManager in the CheckpointCoordinator#addCompletedCheckpointToStoreAndSubsumeOldest() method.

      Attachments

        Issue Links

          Activity

            People

              srichter Stefan Richter
              vivacell Tigran Manasyan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: