Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9763

Race between two re-subscriptions against an empty master.



    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: master, scheduler api
    • Labels:


      Currently, subscription (and re-subscription)  is not atomic.
      It consists of three steps performed by two actors:
       - Validating the supplied FrameworkInfo against the master state (which possibly includes an existing FrameworkInfo)
       - Authorizing the (re-)subscribing framework
       - Applying the update

      A partitioned or buggy (or both) framework can trigger a race by sending two SUBSCRIBE calls with differing FrameworkInfo's on master failover.

      One of the possible sequences of events:
      1. FrameworkInfo A is validated by master (which has no data about this framework)
      2. conflicting FrameworkInfo B is validated by master  (which stores no data about this framework as SchedulerA is not even authorized yet)
      3. Scheduler A is authorized
      4. Scheduler B is authorized
      5. FrameworkInfo A is applied
      6. Master attempts to apply FrameworkInfoB which is no longer valid after the previous step.

      One simple example is an attempt to re-subscribe with two different principals: currently the scheduler B's principal will be silently ignored at step 6 (instead of a validation error sent to B).

      At the moment of writing I'm not sure if there are other problems caused by this race.


          Issue Links



              • Assignee:
                asekretenko Andrei Sekretenko
              • Votes:
                0 Vote for this issue
                3 Start watching this issue


                • Created: