According to the official article,
when sending a metric to Prometheus Pushgateway, you need to give an "instance" message.
In actual use, after there is no "instance", Prometheus stores metrics with problems, metrics are not continuous, and a lot of data is lost. After adding instance, it returns to normal.
In Prometheus terms, an endpoint you can scrape is called an instance, usually corresponding to a single process. A collection of instances with the same purpose, a process replicated for scalability or reliability for example, is called a job.
For example, an API server job with four replicated instances:
– instance 1: 126.96.36.199:5670
– instance 2: 188.8.131.52:5671
– instance 3: 184.108.40.206:5670
– instance 4: 220.127.116.11:5671
I think a Flink job corresponds to a Prometheus job, and taskmanager and jobmanager correspond to different instances. If the jobName is used as the instance label, the same metrics of different tasksmanages will conflict, and operations such as sum will fail.