Details
-
Improvement
-
Status: Closed
-
Not a Priority
-
Resolution: Duplicate
-
1.3.2, 1.4.0
-
None
Description
Currently, the only metric around fine-grained recovery is "task_failures". It's a very high level metric, it would be nice to have the following improvements:
- Allows slice and dice into which tasks were restarted.
- Recovery duration.
- Recovery associated checkpoint behaviors: cancels, failures, etc
Attachments
Issue Links
- is related to
-
FLINK-7844 Fine Grained Recovery triggers checkpoint timeout failure
- Resolved
- is superceded by
-
FLINK-33695 FLIP-384: Introduce TraceReporter and use it to create checkpointing and recovery traces
- Closed