Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.15.0, 1.16.0
Description
We are exploring autoscaling Flink with Reactive mode using metrics from Flink REST for guidance, and found that the metrics are not correctly updated.
Problem
MetricStore does not remove metrics of nonexistent subtasks when adaptive scheduler lowers job parallelism (aka, num of subtasks decreases) and users will see metrics of nonexistent subtasks on Web UI (e.g. the task backpressure page) or REST API response. It causes confusion and occupies extra memory.
Proposed Solution
Thanks to FLINK-29132 & FLINK-28588, Flink will now update current execution attempts when updating metrics. Since the active subtask info is included in the current execution attempt info, we are able to retain active subtasks using the current execution attempt info.
Attachments
Issue Links
- links to