Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.12.2, 1.13.1
Description
When the jobmanager disk is error and the triggerCheckpoint will throw a IOException and fail, this will cause a TRIGGER_CHECKPOINT_FAILURE, but this failure won't cause Job failed. Users can hardly find this error if he don't see the JobManager logs. To avoid this case, I propose that we can figure out these IOException case and increase the failureCounter which can fail the job finally.
Attachments
Attachments
Issue Links
- is duplicated by
-
FLINK-24249 login from keytab fail when disk damage
- Closed
- is related to
-
FLINK-24344 Handling of IOExceptions when triggering checkpoints doesn't cause job failover
- Closed
- links to