Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25888

Capture time that the job spends on deploying tasks

    XMLWordPrintableJSON

Details

    Description

      FLINK-23976 added standardized metrics for capturing how much time we spend in each JobStatus. However, certain states in practice consist of several stages; for example the RUNNING state also includes the deployment of tasks.

      To get a better picture on where time is spent I propose to add new metrics that capture the deployingTime based on the execution states. This will additionally get us closer to a proper uptime metric, which ideally will be runningTime - various stage time metrics.

      A job is considered to be deploying,

      • for batch jobs, if no task is running and at least one task is being deployed
      • for streaming jobs, if at least one task is being deployed

      The semantics are different for batch/streaming jobs because they differ in terms of how they make progress. For a streaming job all tasks need to be deployed for checkpointing to make work. For batch jobs any deployed task immediately starts progressing the job.

      Attachments

        Issue Links

          Activity

            People

              chesnay Chesnay Schepler
              chesnay Chesnay Schepler
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: