[FLINK-18112] Approximate Task-Local Recovery -- Milestone One - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 1.12.0
Fix Version/s: None
Component/s: Runtime / Checkpointing, Runtime / Coordination, Runtime / Network
Labels:
- auto-deprioritized-major
- auto-unassigned

Description

This is the Jira ticket for Milestone One of FLIP-135 Approximate Task-Local Recovery

In short, in Approximate Task-Local Recovery, if a task fails, only the failed task restarts without affecting the rest of the job. To ease discussion, we divide the problem of approximate task-local recovery into three parts with each part only focusing on addressing a set of problems. This Jira ticket focuses on address the first milestone.

Milestone One: sink recovery. Here a sink task stands for no consumers reading data from it. In this scenario, if a sink vertex fails, the sink is restarted from the last successfully completed checkpoint and data loss is expected. If a non-sink vertex fails, a regional failover strategy takes place. In milestone one, we focus on issues related to task failure handling and upstream reconnection.

Milestone one includes two parts of change:

Part 1: Network Part: how the failed task able to link to the upstream Result(Sub)Partitions, and continue processing data

Part 2: Scheduling part, a new failover strategy to restart the sink only when the sink fails.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2021-12-14-10-30-26-486.png
14/Dec/21 02:30
0.3 kB
LiuZeshan
image-2021-12-06-16-39-21-604.png
06/Dec/21 08:39
0.3 kB
LiuZeshan

Issue Links

mentioned in: Page Loading...

Sub-Tasks

1.	Partial Record Cleanup after the Consumer Task Fails and Restart	Closed	Yuan Mei
2.	Introduce a New ResultPartitionType for Approximate Local Recovery	Resolved	Yuan Mei
3.	Scheduler Change for Approximate Local Recovery to Restart Downstream of a Failed Task	Closed	Yuan Mei
4.	Make Approximate Local Recovery Compatible With PipelinedRegionSchedulingStrategy	Open	Unassigned
5.	Introduce Sub Partition View Version for Approximate Local Recovery	Open	Unassigned
6.	Unify Life Cycle Management of ResultPartitionType Pipelined Family	Open	Unassigned
7.	Make Approximate Local Recovery Compatible With Unaligned Checkpoint	Open	Unassigned
8.	Single Task Failure Recovery API Abstraction	Open	Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Yuan Mei

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 04/Jun/20 04:04

Updated:: 22/May/23 09:09