Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.3.0
Description
Users requested that it would be nice to see the last n exceptions causing a job restart in the Web UI. This will help to more easily debug and operate a job.
We could store the root causes for failures similar to how prior executions are stored in the ExecutionVertex using the EvictingBoundedList and then serve this information via the Web UI.
-- Update: January 21, 2021 --
The UI can already handle multiple exceptions through the Exception History. Right now, we list one or more exceptions which caused the job to fail. Instead, we could adapt it in a way that the history contains not only the exceptions of the most recent failure but one expandable entry per restart. If there are more than one exception connected to a single restart, we would list their stacktraces within one expandable entry.
Attachments
Attachments
Issue Links
- is duplicated by
-
FLINK-12662 Record failure and restart information for ExecutionGraph to keep history information
- Closed
- relates to
-
FLINK-14143 Failed Attempts display in the timeline
- Open
-
FLINK-22144 Test display last n exceptions/causes for job restarts in Web UI
- Closed
-
FLINK-21439 Adaptive Scheduler: Add support for exception history
- Closed
- links to
- mentioned in
-
Page Loading...