Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-3177

Change scheduler_heartbeat metric from gauge to counter

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 1.10.1
    • Component/s: scheduler
    • Labels:
      None

      Description

      Currently, the scheduler_heartbeat metric exposed with the statsd integration is a gauge. I'm proposing to change the gauge to a counter for a better integration with Prometheus via the [statsd_exporter|https://github.com/prometheus/statsd_exporter.]

      Rather than pointing Airflow at an actual statsd server, you can point it at this exporter, which will accumulate the metrics and expose them to be scraped by Prometheus at /metrics. The problem is that once this value is set when the scheduler runs its first loop, it will always be exposed to Prometheus as 1. The scheduler can crash, or be turned off and the statsd exporter will report a 1 until it is restarted and rebuilds its internal state.

      By turning this metric into a counter, we can detect an issue with the scheduler by graphing and alerting using a rate. If the rate of change of the counter drops below what it should be at (determined by the scheduler_heartbeat_secs setting), we can fire an alert.

      This should be helpful for adoption in Kubernetes environments where Prometheus is pretty much the standard.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                schnie Greg Neiheisel
                Reporter:
                schnie Greg Neiheisel
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: