Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
1.11.0
-
None
-
None
Description
We have a job of Flink 1.11.0 running on YARN that reached FAILED state because its jobmanager lost leadership during a ZK full GC. But after the ZK connection was recovered, somehow the job was reinitiated again with no checkpoints found in ZK, and hence an earlier savepoint was used to restore the job, which rewound the job unexpectedly.
For details please see the jobmanager logs in the attachment.
Attachments
Attachments
Issue Links
- duplicates
-
FLINK-19816 Flink restored from a wrong checkpoint (a very old one and not the last completed one)
- Closed