Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-2246

Improve slave health-checking

    XMLWordPrintableJSON

Details

    • Epic
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.22.0
    • agent, master
    • None
    • slave health-checking

    Description

      In the event of a network partition, or other systemic issues, we may see widespread slave removal. There are several approaches we can take to mitigate this issue including, but not limited to:

      . rate limit the slave removal
      . change how we do health checking to not rely on a single point of view
      . work with frameworks to determine SLA of running services before removing the slave
      . manual control to allow operator intervention

      Attachments

        Issue Links

          Activity

            People

              vinodkone Vinod Kone
              dhamon Dominic Hamon
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: