[FLINK-5193] Recovering all jobs fails completely if a single recovery fails - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.1.3, 1.2.0
Fix Version/s: 1.1.4, 1.2.0
Component/s: Runtime / Coordination
Labels:
- pull-request-available

Description

In HA case where the JobManager tries to recover all submitted job graphs, e.g. when regaining leadership, it can happen that none of the submitted jobs are recovered if a single recovery fails. Instead of failing the complete recovery procedure, the JobManager should still try to recover the remaining (non-failing) jobs and print a proper error message for the failed recoveries.

Attachments

Issue Links

links to

GitHub Pull Request #2909

GitHub Pull Request #2910

GitHub Pull Request #24777

Activity

People

Assignee:: Till Rohrmann

Reporter:: Till Rohrmann

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Nov/16 14:28

Updated:: 13/May/24 11:37

Resolved:: 09/Dec/16 16:53