Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10857

Conflict between JMX and Prometheus Metrics reporter

    XMLWordPrintableJSON

Details

    Description

      When registering both JMX and Prometheus metrics reporter, the Prometheus reporter will fail with many exceptions, such as.

       

      o.a.f.r.m.MetricRegistryImpl Error while registering metric.
      java.lang.IllegalArgumentException: Invalid metric name: flink_jobmanager.Status.JVM.Memory.Mapped_Count
      	at org.apache.flink.shaded.io.prometheus.client.Collector.checkMetricName(Collector.java:182)
      	at org.apache.flink.shaded.io.prometheus.client.SimpleCollector.<init>(SimpleCollector.java:164)
      	at org.apache.flink.shaded.io.prometheus.client.Gauge.<init>(Gauge.java:68)
      	at org.apache.flink.shaded.io.prometheus.client.Gauge$Builder.create(Gauge.java:74)
      	at org.apache.flink.metrics.prometheus.AbstractPrometheusReporter.createCollector(AbstractPrometheusReporter.java:130)
      	at org.apache.flink.metrics.prometheus.AbstractPrometheusReporter.notifyOfAddedMetric(AbstractPrometheusReporter.java:106)
      	at org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:329)
      	at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:379)
      	at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.gauge(AbstractMetricGroup.java:323)
      	at org.apache.flink.runtime.metrics.util.MetricUtils.instantiateMemoryMetrics(MetricUtils.java:231)
      	at org.apache.flink.runtime.metrics.util.MetricUtils.instantiateStatusMetrics(MetricUtils.java:100)
      	at org.apache.flink.runtime.metrics.util.MetricUtils.instantiateJobManagerMetricGroup(MetricUtils.java:68)
      	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startClusterComponents(ClusterEntrypoint.java:342)
      	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:233)
      	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:191)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
      	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
      	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:190)
      	at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:176)
      

       

      This is a small program to reproduce the problem:

      https://github.com/dikei/flink-metrics-conflict-test

       

      I

      Attachments

        Issue Links

          Activity

            People

              chesnay Chesnay Schepler
              kien_truong Truong Duc Kien
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: