Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.12.0
-
None
Description
This is the Jira ticket for Milestone One of FLIP-135 Approximate Task-Local Recovery
In short, in Approximate Task-Local Recovery, if a task fails, only the failed task restarts without affecting the rest of the job. To ease discussion, we divide the problem of approximate task-local recovery into three parts with each part only focusing on addressing a set of problems. This Jira ticket focuses on address the first milestone.
Milestone One: sink recovery. Here a sink task stands for no consumers reading data from it. In this scenario, if a sink vertex fails, the sink is restarted from the last successfully completed checkpoint and data loss is expected. If a non-sink vertex fails, a regional failover strategy takes place. In milestone one, we focus on issues related to task failure handling and upstream reconnection.
Milestone one includes two parts of change:
Part 1: Network Part: how the failed task able to link to the upstream Result(Sub)Partitions, and continue processing data
Part 2: Scheduling part, a new failover strategy to restart the sink only when the sink fails.
Attachments
Attachments
Issue Links
- mentioned in
-
Page Loading...