Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7426

Support for agent lifecycle management.

    Details

    • Type: Epic
    • Status: Accepted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: agent
    • Epic Name:
      Mesos Agent Lifecycle

      Description

      This epic co-ordinates the work for introducing agent lifecycle management in Mesos allowing a framework to be notified in case of agent node failures. The existing Event::Failure is not enough for frameworks to know that the given agent node isn't ever coming back.

      The primary motivations for introducing such a feature would be:

      • Currently, when an agent running a task fails, there is inherently an operator interference needed (manual step) to remove the node via a configuration API exposed by the framework e.g., dcos cassandra node replace for the cassandra framework. This needs to be done once for every stateful framework running on the cluster.
      • When an agent is marked as unhealthy, the removal rate is bounded if the `--agent_rate_removal_limit` option is set. This is specifically problematic for operators relying on EC2 autoscaling groups or for workload bursting to another cloud.
      • When an agent is marked as unhealthy, the removal rate is bounded if the `--agent_rate_removal_limit` option is set. This is specifically problematic for operators relying on EC2 autoscaling groups or for workload bursting to another cloud.
      • When the fault domain associated with an agent changes (e.g., it is moved from an unallocated rack to an allocated rack), there is no feedback mechanism for the framework.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                anandmazumdar Anand Mazumdar
                Reporter:
                anandmazumdar Anand Mazumdar
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: