This issue tracks the development of recovery from task-local state. The main idea is to have a secondary, local copy of the checkpointed state, while there is still a primary copy in DFS that we report to the checkpoint coordinator.
Recovery can attempt to restore from the secondary local copy, if available, to save network bandwidth. This requires that the assignment from tasks to slots is as sticky is possible.
For starters, we will implement this feature for all managed keyed states and can easily enhance it to all other state types (e.g. operator state) later.