Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11457

PrometheusPushGatewayReporter does not cleanup its metrics

    XMLWordPrintableJSON

Details

    Description

      When cancelling a job running on a yarn based cluster and then shutting down the cluster, metrics on the push gateway are not deleted.

      My yarn-conf.yaml settings:

      metrics.reporters: promgateway
      metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
      metrics.reporter.promgateway.host: pushgateway.gcpstg.bolcom.net
      metrics.reporter.promgateway.port: 9091
      metrics.reporter.promgateway.jobName: PSMF
      metrics.reporter.promgateway.randomJobNameSuffix: true
      metrics.reporter.promgateway.deleteOnShutdown: true
      metrics.reporter.promgateway.interval: 30 SECONDS
      

      What I expect to happen:

      • when running, the metrics are pushed to the push gateway to a separate label per node (jobmanager/taskmanager)
      • when shutting down, the metrics are deleted from the push gateway

      This last bit does not happen.

      How the job is run:

      flink run -m yarn-cluster -yn 5 -ys 2 -yst "$INSTALL_DIRECTORY/app/psmf.jar"

       

      How the job is stopped:

      YARN_APP_ID=$(yarn application -list | grep "PSMF" | awk '{print $1}')
      FLINK_JOB_ID=$(flink list -r -yid ${YARN_APP_ID} | grep "PSMF" | awk '{print $4}')
      flink cancel -s "${SAVEPOINT_DIR%/}/" -yid "${YARN_APP_ID}" "${FLINK_JOB_ID}"
      echo "stop" | yarn-session.sh -id ${YARN_APP_ID}
      

       

      Is there anything I'm sdoing wrong? Anything I can help to fix?

      Attachments

        Activity

          People

            Unassigned Unassigned
            opwvhk Oscar Westra van Holthe - Kind
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: