Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.9.0
-
None
-
Foundations: RI-20 59, Studio 4: RI-21 60, Studio 4: RI-21 61, Studio 4: RI-22 62, Studio 4: RI-23 64
-
8
Description
Observed an example of such reordering on a testing cluster with a V1 framework.
Framework side:
- framework issues ACCEPT for a slave with no operations and a 365+ days filter
- framework issues REVIVE call for all roles (which should clear all filters)
- framework waits for an offer for that slave and never receives it
Master side:
- master receives ACCEPT, processes the first part and starts authorization
- master receives REVIVE and dispatches reviveOffers() to the allocator
- master receives a response from authorizer (for ACCEPT) and dispatches recoverResources() with a 365-day filter to the allocator
We need to provide an ability for the framework to avoid such kind of reorderings.
Things to consider:
- v1 framework are not required to use a single connection for API requests; even if they were, there still is a reconnection case, during which the views of the framework and the master on the state of connection might differ. This means that we cannot completely avoid this problem by sequencing processing of requests from the same connection.
- Currently, all calls directly influencing allocator (except for UPDATE_FRAMEWORK) return '202 ACCEPTED` at an early stage of processing. Unconditionally changing this might break compatibility with some existing frameworks.
Attachments
Issue Links
- is fixed by
-
MESOS-10056 Perform synchronous authorization for scheduler calls.
- Resolved