[FLINK-8360] Implement task-local state recovery - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.0
Component/s: Runtime / State Backends
Labels:
None

Description

This issue tracks the development of recovery from task-local state. The main idea is to have a secondary, local copy of the checkpointed state, while there is still a primary copy in DFS that we report to the checkpoint coordinator.

Recovery can attempt to restore from the secondary local copy, if available, to save network bandwidth. This requires that the assignment from tasks to slots is as sticky is possible.

For starters, we will implement this feature for all managed keyed states and can easily enhance it to all other state types (e.g. operator state) later.