Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8609

Create a metric to indicate how long agent takes to recover executors

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: agent
    • Labels:
    • Target Version/s:
    • Docs Text:
      Hide
      We noticed that the time it takes to recover after agent failover is non constant (specifically related to number of executors). To allow people have some idea on this, we should create metrics.

      What I propose specifically:
      - a gauge about how many executor directories need to be scanned;
      - a gauge about how long agent takes to scan them.

      Alternatively, this may be done through a log line, but it may be harder to aggregate or monitor.
      Show
      We noticed that the time it takes to recover after agent failover is non constant (specifically related to number of executors). To allow people have some idea on this, we should create metrics. What I propose specifically: - a gauge about how many executor directories need to be scanned; - a gauge about how long agent takes to scan them. Alternatively, this may be done through a log line, but it may be harder to aggregate or monitor.

      Attachments

        Activity

          People

          • Assignee:
            zhitao Zhitao Li
            Reporter:
            zhitao Zhitao Li
            Shepherd:
            James Peach
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: