Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: JobManager
    • Labels:
      None

      Description

      When a task fails during execution, Flink currently resets the entire execution graph and triggers complete re-execution from the last completed checkpoint. This is more expensive than just re-executing the failed tasks.

      In many cases, more fine-grained recovery is possible.

      The full description and design is in the corresponding FLIP.

      https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures

      The detail desgin for version1 is https://docs.google.com/document/d/1_PqPLA1TJgjlqz8fqnVE3YSisYBDdFsrRX_URgRSj74/edit#

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                StephanEwen Stephan Ewen
                Reporter:
                StephanEwen Stephan Ewen
              • Votes:
                0 Vote for this issue
                Watchers:
                24 Start watching this issue

                Dates

                • Created:
                  Updated: