Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-7894

Improve metrics around fine-grained recovery and associated checkpointing behaviors

    XMLWordPrintableJSON

Details

    Description

      Currently, the only metric around fine-grained recovery is "task_failures". It's a very high level metric, it would be nice to have the following improvements:

      • Allows slice and dice into which tasks were restarted.
      • Recovery duration.
      • Recovery associated checkpoint behaviors: cancels, failures, etc

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhenzhongxu Zhenzhong Xu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: