Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.15.0
-
None
Description
The DefaultJobManagerRunner.localCleanupAsync method deregisters the JobManagerRunner and calls close on it. If close fails for whatever reason, it will be identified but the next retry would just notice that the JobManagerRunner is already deregistered and not do anything.
Hence, JobMaster shutdown won't be retriggered (i.e. errors in the CompletedCheckpointStore or the CheckpointIDCounter won't be handled). FLINK-26114 is related: Both components don't expose any errors right now, anyway.
Attachments
Issue Links
- is related to
-
FLINK-26114 DefaultScheduler fails fatally in case of an error when shutting down the checkpoint-related resources
- Open
- links to