Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9763

Race between two re-subscriptions against an empty master.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • master, scheduler api

    Description

      Currently, subscription (and re-subscription)  is not atomic.
      It consists of three steps performed by two actors:
       - Validating the supplied FrameworkInfo against the master state (which possibly includes an existing FrameworkInfo)
       - Authorizing the (re-)subscribing framework
       - Applying the update

      A partitioned or buggy (or both) framework can trigger a race by sending two SUBSCRIBE calls with differing FrameworkInfo's on master failover.

      One of the possible sequences of events:
      1. FrameworkInfo A is validated by master (which has no data about this framework)
      2. conflicting FrameworkInfo B is validated by master  (which stores no data about this framework as SchedulerA is not even authorized yet)
      3. Scheduler A is authorized
      4. Scheduler B is authorized
      5. FrameworkInfo A is applied
      6. Master attempts to apply FrameworkInfoB which is no longer valid after the previous step.

      One simple example is an attempt to re-subscribe with two different principals: currently the scheduler B's principal will be silently ignored at step 6 (instead of a validation error sent to B).

      At the moment of writing I'm not sure if there are other problems caused by this race.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              asekretenko Andrei Sekretenko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: