Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8609

Create a metric to indicate how long agent takes to recover executors

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.6.0
    • agent
    • Hide
      We noticed that the time it takes to recover after agent failover is non constant (specifically related to number of executors). To allow people have some idea on this, we should create metrics.

      What I propose specifically:
      - a gauge about how many executor directories need to be scanned;
      - a gauge about how long agent takes to scan them.

      Alternatively, this may be done through a log line, but it may be harder to aggregate or monitor.
      Show
      We noticed that the time it takes to recover after agent failover is non constant (specifically related to number of executors). To allow people have some idea on this, we should create metrics. What I propose specifically: - a gauge about how many executor directories need to be scanned; - a gauge about how long agent takes to scan them. Alternatively, this may be done through a log line, but it may be harder to aggregate or monitor.

    Attachments

      Activity

        People

          zhitao Zhitao Li
          zhitao Zhitao Li
          James Peach James Peach
          Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved: