Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-26114

DefaultScheduler fails fatally in case of an error when shutting down the checkpoint-related resources

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.15.0
    • None
    • Runtime / Coordination
    • None

    Description

      In contrast to the AdaptiveScheduler, the DefaultScheduler fails fatally in case of an error while cleaning up the checkpoint-related resources. This contradicts our new approach of retrying the cleanup of job-related data (see FLINK-25433). Instead, we would want the DefaultScheduler to return an exceptionally completed future with the exception. This enables the DefaultResourceCleaner to trigger a retry.

      Both scheduler implementations do not expose the error during shutdown of the CompletedCheckpointStore or CheckpointIDCounter right now. This would need to be addressed as well.

      Attachments

        Issue Links

          Activity

            People

              atri Atri Sharma
              mapohl Matthias Pohl
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: