Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8933

Stop sending offers from agents in draining mode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Background:

      At Yelp, we use mesos to run microservices(marathon), batch jobs(chronos and custom frameworks), spark(spark mesos framework) etc.  We also autoscale the number of agents in our cluster based on the current demand and some other metrics. We use mesos maintenance primitives to gracefully shut down mesos agents. 

      Problem:

      When we want to shut down an agent for some reason, we first move the agent into draining mode. This allows us to gracefully terminate the micro-services and other tasks. But, mesos continues to send offers from that agent with unavailability set. Frameworks such as marathon, chronos, and spark ignore the unavailability and schedule the tasks on the agent. To prevent this from happening, we allocate all the available resources on that agent to a role that is not used by any framework. But, this approach is not fool-proof. There is still a race condition between when we move the agent into draining mode and when we allocate all the available resources on the agent to maintenance role.

      Proposal:

       It would be nice if mesos stops sending offers from the agents in draining mode. Something like this: https://gist.github.com/sagar8192/0b9dbccc908818f8f9f5a18d1f634513 I don't know if this affects the allocator or not. We can put this behind a flag(something like --do-not-send-offers-from-agents-in-draining-mode) and make it optional.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sagar8192 Sagar Sadashiv Patwardhan
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: