Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14206

Let fullRestart metric count fine grained restarts as well

    XMLWordPrintableJSON

    Details

      Description

      With fine grained recovery introduced in 1.9.0, the fullRestart metric only counts how many times the entire graph has been restarted, not including the number of fine grained failure restarts.

      As many users leverage this metric for failure detecting monitoring and alerting, I'd propose to make it also count fine grained restarts.

      The concrete proposal is:

      • Add a counter numberOfRestartsCounter in ExecutionGraph to count all restarts. The counter is not to be registered to metric groups.
      • Let fullRestart query the value of the counter, instead of ExecutionGraph#globalModVersion
      • increment numberOfRestartsCounter in ExecutionGraph#incrementGlobalModVersion()
      • increment numberOfRestartsCounter in AdaptedRestartPipelinedRegionStrategyNG#restartTasks(...), to ensure that the fine grained recovery really happens

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zhuzh Zhu Zhu
                Reporter:
                zhuzh Zhu Zhu
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m