Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10857

Conflict between JMX and Prometheus Metrics reporter

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      When registering both JMX and Prometheus metrics reporter, the Prometheus reporter will fail with many exceptions, such as.

       

      o.a.f.r.m.MetricRegistryImpl Error while registering metric.
      java.lang.IllegalArgumentException: Invalid metric name: flink_jobmanager.Status.JVM.Memory.Mapped_Count
      	at org.apache.flink.shaded.io.prometheus.client.Collector.checkMetricName(Collector.java:182)
      	at org.apache.flink.shaded.io.prometheus.client.SimpleCollector.<init>(SimpleCollector.java:164)
      	at org.apache.flink.shaded.io.prometheus.client.Gauge.<init>(Gauge.java:68)
      	at org.apache.flink.shaded.io.prometheus.client.Gauge$Builder.create(Gauge.java:74)
      	at org.apache.flink.metrics.prometheus.AbstractPrometheusReporter.createCollector(AbstractPrometheusReporter.java:130)
      	at org.apache.flink.metrics.prometheus.AbstractPrometheusReporter.notifyOfAddedMetric(AbstractPrometheusReporter.java:106)
      	at org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:329)
      	at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:379)
      	at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.gauge(AbstractMetricGroup.java:323)
      	at org.apache.flink.runtime.metrics.util.MetricUtils.instantiateMemoryMetrics(MetricUtils.java:231)
      	at org.apache.flink.runtime.metrics.util.MetricUtils.instantiateStatusMetrics(MetricUtils.java:100)
      	at org.apache.flink.runtime.metrics.util.MetricUtils.instantiateJobManagerMetricGroup(MetricUtils.java:68)
      	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startClusterComponents(ClusterEntrypoint.java:342)
      	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:233)
      	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:191)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
      	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
      	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:190)
      	at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:176)
      

       

      This is a small program to reproduce the problem:

      https://github.com/dikei/flink-metrics-conflict-test

       

      I

        Attachments

          Activity

            People

            • Assignee:
              chesnay Chesnay Schepler
              Reporter:
              kien_truong Truong Duc Kien

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment