Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This issue tracks the development of recovery from task-local state. The main idea is to have a secondary, local copy of the checkpointed state, while there is still a primary copy in DFS that we report to the checkpoint coordinator.
Recovery can attempt to restore from the secondary local copy, if available, to save network bandwidth. This requires that the assignment from tasks to slots is as sticky is possible.
For starters, we will implement this feature for all managed keyed states and can easily enhance it to all other state types (e.g. operator state) later.
Attachments
Attachments
Issue Links
- depends upon
-
FLINK-7719 Send checkpoint id to task as part of deployment descriptor when resuming
- Closed
-
FLINK-7720 Centralize creation of backends and state related resources
- Closed
- is related to
-
FLINK-11159 Allow configuration whether to fall back to savepoints for restore
- Closed
- supercedes
-
FLINK-7873 Introduce CheckpointCacheManager for reading checkpoint data locally when performing failover
- Closed
- links to