Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.14.1
-
None
Description
Problem
In current implementation, checkpoints failed in trigger phase do not count into metric 'numberOfFailedCheckpoints'. Such that users can not aware checkpoint stoped by this metric.
As lang as users can use rules like 'numberOfCompletedCheckpoints' not increase in some minutes past (maybe checkpoint interval + timeout) for alerting, but I think it is ambages and can not alert timely.
Proposal
As the title, count checkpoints failed in trigger phase into 'numberOfFailedCheckpoints'.
Attachments
Issue Links
- duplicates
-
FLINK-26049 The tolerable-failed-checkpoints logic is invalid when checkpoint trigger failed
- Closed