Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4256

Fine-grained recovery

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.9.0
    • Runtime / Coordination
    • None

    Description

      When a task fails during execution, Flink currently resets the entire execution graph and triggers complete re-execution from the last completed checkpoint. This is more expensive than just re-executing the failed tasks.

      In many cases, more fine-grained recovery is possible.

      The full description and design is in the corresponding FLIP.

      https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures

      The detail desgin for version1 is https://docs.google.com/document/d/1_PqPLA1TJgjlqz8fqnVE3YSisYBDdFsrRX_URgRSj74/edit#

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              sewen Stephan Ewen
              sewen Stephan Ewen
              Votes:
              0 Vote for this issue
              Watchers:
              39 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h Original Estimate - 168h
                  168h
                  Remaining:
                  Time Spent - 5h 10m Remaining Estimate - 167.5h
                  167.5h
                  Logged:
                  Time Spent - 5h 10m Remaining Estimate - 167.5h
                  5h 10m