Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4815 Automatic fallback to earlier checkpoints when checkpoint restore fails
  3. FLINK-4818

RestartStrategy should track how many failed restore attempts the same checkpoint has and fall back to earlier checkpoints

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Won't Do
    • None
    • None
    • Runtime / Coordination
    • None

    Description

      The restart strategies can use the exception information from FLINK-4816 to keep track of how often a checkpoint restore has failed. After a certain number of consecutive failures, they should take earlier completed checkpoints as recovery points.

      It is up to discussion whether the restart strategies are the right place to implement that, or whether this is an orthogonal feature that should go into the checkpoint coordinator (which knows how many checkpoints are available) or a separate class altogether.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sewen Stephan Ewen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: