Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
If a checkpoint times out, there are currently no stats on the not-yet-finished tasks in the Web UI, so you have to crawl into (debug?) logs.
It would be nice to have these incomplete stats in there instead so that you know quickly what was going on. I could think of these ways to accomplish this:
- the checkpoint coordinator could ask the TMs for it after failing the checkpoint or
- the TMs could send the stats when they notice that the checkpoint is aborted
Maybe there are more options, but I think, this improvement in general would benefit debugging checkpoints.
Attachments
Issue Links
- causes
-
FLINK-21217 Resuming Savepoint (rocks, scale up, rocks timers) end-to-end test
-
- Closed
-
-
FLINK-21272 Resuming Savepoint (rocks, scale down, rocks timers) end-to-end test' Fail
-
- Closed
-
-
FLINK-21122 Update checkpoint_monitoring.zh.md
-
- Closed
-
- is related to
-
FLINK-20886 Add the option to get a threaddump on checkpoint timeouts
-
- Open
-
- links to
- mentioned in
-
Page Loading...