Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-38

Executor resource monitoring and local reporting of usage stats

    XMLWordPrintableJSON

Details

    • Story
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • agent, containerization
    • Initial executor monitoring for linux only. Dummy monitoring capability (no-op) for OSX, with functionality to be filled in later.

    Description

      Implement reporting of resource usage on executors and log them to a local log file (for now). The eventual usage of this will be to report these statistics to the Mesos master in order to build either or both a timeline for the webui and/or a top-like command-line interface. This improvement ticket is just for the local monitoring and log file reporting. A reporting system (to the master node) will be a later improvement ticket.

      With the current version of Mesos, it is not possible to monitor individual tasks. Therefore the best this sort of system can do is monitor the usage of an individual executor and aggregate the resource usage of over the executor's tasks and resource allocations. If frameworks have a 1-to-1 relationship of a job to an executor, then the aggregate statistics will be more meaningful.

      Reporting will be available for both lxc isolation and process-based isolation. For lxc isolation the task is easier because of the isolation facilities of lxc. Process-based isolation is more difficult as processes can become re-parented from the process tree of the executor (e.g. double fork). The session ID and the process group ID will likely still be the same as that of the executor except for the uncommon case of the process resetting both of those.

      When usage statistics are eventually reported to the Mesos master, it may be possible to use them to oversubscribe slave nodes.

      Attachments

        Issue Links

          Activity

            People

              bmahler Benjamin Mahler
              _sam_ Sam Whitlock
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: