Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-29615

MetricStore does not remove metrics of nonexistent subtasks when adaptive scheduler lowers job parallelism

    XMLWordPrintableJSON

Details

    Description

      We are exploring autoscaling Flink with Reactive mode using metrics from Flink REST for guidance, and found that the metrics are not correctly updated.

       

      Problem

      MetricStore does not remove metrics of nonexistent subtasks when adaptive scheduler lowers job parallelism (aka, num of subtasks decreases) and users will see metrics of nonexistent subtasks on Web UI (e.g. the task backpressure page) or REST API response. It causes confusion and occupies extra memory.

       

      Proposed Solution

      Thanks to FLINK-29132 & FLINK-28588,  Flink will now update current execution attempts when updating metrics. Since the active subtask info is included in the current execution attempt info, we are able to retain active subtasks using the current execution attempt info.

       

      Attachments

        Issue Links

          Activity

            People

              Zhanghao Chen Zhanghao Chen
              Zhanghao Chen Zhanghao Chen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: