Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10521

Faulty Histogram stops Prometheus metrics from being reported

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 1.6.1
    • None
    • Runtime / Metrics
    • None

    Description

      In my setup I am using the prometheus reporter and a custom implemented histogram metric. After a while the histogram starts throwing exceptions (because it is rather poorly implemented). This causes all metrics on the taskmanager where the histogram is running to stop being reported. By looking at the prometheus logs you can see that requests to taskmanager:9249/metrics will return an empty response when a metric is faulty.

       

      Expected:

      A faulty metrics implementation causes this specific metric to stop being reported

      Actual:

      A faulty metric will cause all metrics on that taskmanager to stop being reported

      Attachments

        1. taskmanager.log
          375 kB
          Florian Schmidt
        2. prometheus.log
          56 kB
          Florian Schmidt
        3. Screenshot 2018-10-10 at 11.32.59.png
          36 kB
          Florian Schmidt

        Activity

          People

            Unassigned Unassigned
            florianschmidt Florian Schmidt
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: