Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-6676

Always re-link with scheduler during re-registration.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.3, 1.1.1, 1.2.0
    • Component/s: master
    • Labels:

      Description

      Scenario:

      1. Framework registers with master using a non-zero failover_timeout and is assigned a FrameworkID.
      2. The master sees an ExitedEvent for the master->scheduler link. This could happen due to some transient network error, e.g., 1-way partition. The master sends a FrameworkErrorMessage to the framework. The master marks the framework as disconnected, but keeps the Framework* for it around in frameworks.registered.
      3. The framework doesn't receive the FrameworkErrorMessage because it is dropped by the network.
      4. The scheduler might receive an ExitedEvent for the scheduler -> master link, but it ignores this anyway (see MESOS-887).
      5. The scheduler sees a new-master-detected event and re-registers with the master. It doesn not set the force flag. This means we follow this code path in the master, which does not relink with the scheduler.

      The result is that scheduler re-registration succeds, but the master -> scheduler link is never re-established.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                neilc Neil Conway
                Reporter:
                neilc Neil Conway
                Shepherd:
                Vinod Kone
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: