Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21510

ExecutionGraph metrics collide on restart

    XMLWordPrintableJSON

Details

    Description

      The ExecutionGraphBuilder registers several metrics directly on the JobManagerJobMetricGroup, which are never cleaned up.

      These include upTime/DownTime/restartingTime as well as various checkpointing metrics (see the CheckpointStatsTracker for details; examples are number of checkpoints, checkpoint sizes etc).

      When the AdaptiveScheduler re-creates the EG these will collide with metrics of prior attempts.

      Essentially we either need to create a separate metric group that we pass to the EG or refactor the metrics to be based on some mutable EG reference.

      Attachments

        Issue Links

          Activity

            People

              chesnay Chesnay Schepler
              chesnay Chesnay Schepler
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: