Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21510

ExecutionGraph metrics collide on restart

    XMLWordPrintableJSON

Details

    Description

      The ExecutionGraphBuilder registers several metrics directly on the JobManagerJobMetricGroup, which are never cleaned up.

      These include upTime/DownTime/restartingTime as well as various checkpointing metrics (see the CheckpointStatsTracker for details; examples are number of checkpoints, checkpoint sizes etc).

      When the AdaptiveScheduler re-creates the EG these will collide with metrics of prior attempts.

      Essentially we either need to create a separate metric group that we pass to the EG or refactor the metrics to be based on some mutable EG reference.

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              chesnay Chesnay Schepler
              chesnay Chesnay Schepler
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: