Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-18112

Approximate Task-Local Recovery -- Milestone One

    XMLWordPrintableJSON

Details

    Description

      This is the Jira ticket for Milestone One of FLIP-135 Approximate Task-Local Recovery

      In short, in Approximate Task-Local Recovery, if a task fails, only the failed task restarts without affecting the rest of the job. To ease discussion, we divide the problem of approximate task-local recovery into three parts with each part only focusing on addressing a set of problems. This Jira ticket focuses on address the first milestone.

      Milestone One: sink recovery. Here a sink task stands for no consumers reading data from it. In this scenario, if a sink vertex fails, the sink is restarted from the last successfully completed checkpoint and data loss is expected. If a non-sink vertex fails, a regional failover strategy takes place. In milestone one, we focus on issues related to task failure handling and upstream reconnection.

       

      Milestone one includes two parts of change:

      Part 1: Network Part: how the failed task able to link to the upstream Result(Sub)Partitions, and continue processing data

      Part 2: Scheduling part, a new failover strategy to restart the sink only when the sink fails.

       

      Attachments

        1. image-2021-12-06-16-39-21-604.png
          0.3 kB
          LiuZeshan
        2. image-2021-12-14-10-30-26-486.png
          0.3 kB
          LiuZeshan

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ym Yuan Mei
              Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated: