Details
-
Bug
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
None
-
None
Description
When cancelling a job running on a yarn based cluster and then shutting down the cluster, metrics on the push gateway are not deleted.
My yarn-conf.yaml settings:
metrics.reporters: promgateway metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter metrics.reporter.promgateway.host: pushgateway.gcpstg.bolcom.net metrics.reporter.promgateway.port: 9091 metrics.reporter.promgateway.jobName: PSMF metrics.reporter.promgateway.randomJobNameSuffix: true metrics.reporter.promgateway.deleteOnShutdown: true metrics.reporter.promgateway.interval: 30 SECONDS
What I expect to happen:
- when running, the metrics are pushed to the push gateway to a separate label per node (jobmanager/taskmanager)
- when shutting down, the metrics are deleted from the push gateway
This last bit does not happen.
How the job is run:
flink run -m yarn-cluster -yn 5 -ys 2 -yst "$INSTALL_DIRECTORY/app/psmf.jar"
How the job is stopped:
YARN_APP_ID=$(yarn application -list | grep "PSMF" | awk '{print $1}') FLINK_JOB_ID=$(flink list -r -yid ${YARN_APP_ID} | grep "PSMF" | awk '{print $4}') flink cancel -s "${SAVEPOINT_DIR%/}/" -yid "${YARN_APP_ID}" "${FLINK_JOB_ID}" echo "stop" | yarn-session.sh -id ${YARN_APP_ID}
Is there anything I'm sdoing wrong? Anything I can help to fix?