Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
The ExecutionGraphBuilder registers several metrics directly on the JobManagerJobMetricGroup, which are never cleaned up.
These include upTime/DownTime/restartingTime as well as various checkpointing metrics (see the CheckpointStatsTracker for details; examples are number of checkpoints, checkpoint sizes etc).
When the AdaptiveScheduler re-creates the EG these will collide with metrics of prior attempts.
Essentially we either need to create a separate metric group that we pass to the EG or refactor the metrics to be based on some mutable EG reference.
Attachments
Issue Links
- causes
-
FLINK-21855 Document Metrics Limitations
- Closed
- depends upon
-
FLINK-21075 FLIP-160: Adaptive scheduler
- Closed
- is related to
-
FLINK-21513 Rethink up-/down-/restartingTime metrics
- Open