Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32173

Flink Job Metrics returns stale values in the first request after an update in the values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.17.0
    • None
    • Runtime / Metrics
    • None

    Description

      Flink Job Metrics returns stale values in the first request after an update in the values.

      Repro:

      1. Run a flink job with fixed strategy and with multiple attempts

      restart-strategy: fixed-delay
      restart-strategy.fixed-delay.attempts: 10000
      
      
      flink run -Dexecution.checkpointing.interval="10s" -d -c org.apache.flink.streaming.examples.wordcount.WordCount /usr/lib/flink/examples/streaming/WordCount.jar
      

      2. Kill one of the TaskManager which will initiate job restart.

      3. After job restarted, fetch any job metrics. The first time it returns stale (older) value 48.

      [hadoop@ip-172-31-44-70 ~]$ curl http://jobmanager:52000/jobs/d24f7d74d541f1215a65395e0ebd898c/metrics?get=numRestarts  | jq .
      [
        {
          "id": "numRestarts",
          "value": "48"
        }
      ]
      

      4. On subsequent runs, it returns the correct value.

      [hadoop@ip-172-31-44-70 ~]$ curl http://jobmanager:52000/jobs/d24f7d74d541f1215a65395e0ebd898c/metrics?get=numRestarts  | jq .
      [
        {
          "id": "numRestarts",
          "value": "49"
        }
      ]
      

      5. Repeat steps 2 to 5, which will show that the first request after an update to the metrics returns a previous value before the update. Only on the next request is the correct value returned.

      Attachments

        Activity

          People

            Unassigned Unassigned
            prabhujoseph Prabhu Joseph
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: