Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-305

Inform the frameworks / slaves about a master failover

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      With the recent changes in the master detecter code, we no longer send 'NoMasterDetected' to the scheduler driver, which in turn means the 'disconnected' scheduler callback is never invoked.

      At Twitter this manifested as a spew of LOST tasks whenever a master failover happens. This is because the scheduler holds on to offers for a while and never knows about the invalidity of offers, until after tasks are launched. Though this is a race, it is ideal to minimize this window as much as possible by informing the scheduler of the master failover.

        Attachments

          Activity

            People

            • Assignee:
              bmahler Benjamin Mahler
              Reporter:
              vinodkone Vinod Kone
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: