[MESOS-9763] Race between two re-subscriptions against an empty master. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: master, scheduler api
Labels:
- foundations

Description

Currently, subscription (and re-subscription) is not atomic.
It consists of three steps performed by two actors:
- Validating the supplied FrameworkInfo against the master state (which possibly includes an existing FrameworkInfo)
- Authorizing the (re-)subscribing framework
- Applying the update

A partitioned or buggy (or both) framework can trigger a race by sending two SUBSCRIBE calls with differing FrameworkInfo's on master failover.

One of the possible sequences of events:
1. FrameworkInfo A is validated by master (which has no data about this framework)
2. conflicting FrameworkInfo B is validated by master (which stores no data about this framework as SchedulerA is not even authorized yet)
3. Scheduler A is authorized
4. Scheduler B is authorized
5. FrameworkInfo A is applied
6. Master attempts to apply FrameworkInfoB which is no longer valid after the previous step.

One simple example is an attempt to re-subscribe with two different principals: currently the scheduler B's principal will be silently ignored at step 6 (instead of a validation error sent to B).

At the moment of writing I'm not sure if there are other problems caused by this race.

Attachments

Issue Links

is related to

MESOS-7258 Provide scheduler calls to subscribe to additional roles and unsubscribe from roles.

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Andrei Sekretenko

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/May/19 18:44

Updated:: 13/Jun/19 12:19