Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-6676

Always re-link with scheduler during re-registration.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.3, 1.1.1, 1.2.0
    • master

    Description

      Scenario:

      1. Framework registers with master using a non-zero failover_timeout and is assigned a FrameworkID.
      2. The master sees an ExitedEvent for the master->scheduler link. This could happen due to some transient network error, e.g., 1-way partition. The master sends a FrameworkErrorMessage to the framework. The master marks the framework as disconnected, but keeps the Framework* for it around in frameworks.registered.
      3. The framework doesn't receive the FrameworkErrorMessage because it is dropped by the network.
      4. The scheduler might receive an ExitedEvent for the scheduler -> master link, but it ignores this anyway (see MESOS-887).
      5. The scheduler sees a new-master-detected event and re-registers with the master. It doesn not set the force flag. This means we follow this code path in the master, which does not relink with the scheduler.

      The result is that scheduler re-registration succeds, but the master -> scheduler link is never re-established.

      Attachments

        Issue Links

          Activity

            People

              neilc Neil Conway
              neilc Neil Conway
              Vinod Kone Vinod Kone
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: