[MESOS-10023] Allocator method dispatches can be reordered (relative to scheduler API calls which triggered them). - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.9.0
Fix Version/s: 1.10.0
Component/s: None
Labels:
- foundations

Sprint:
Foundations: RI-20 59, Studio 4: RI-21 60, Studio 4: RI-21 61, Studio 4: RI-22 62, Studio 4: RI-23 64
Story Points:
8

Description

Observed an example of such reordering on a testing cluster with a V1 framework.
Framework side:

framework issues ACCEPT for a slave with no operations and a 365+ days filter
framework issues REVIVE call for all roles (which should clear all filters)
framework waits for an offer for that slave and never receives it

Master side:

master receives ACCEPT, processes the first part and starts authorization
master receives REVIVE and dispatches reviveOffers() to the allocator
master receives a response from authorizer (for ACCEPT) and dispatches recoverResources() with a 365-day filter to the allocator

We need to provide an ability for the framework to avoid such kind of reorderings.

Things to consider:

v1 framework are not required to use a single connection for API requests; even if they were, there still is a reconnection case, during which the views of the framework and the master on the state of connection might differ. This means that we cannot completely avoid this problem by sequencing processing of requests from the same connection.

Currently, all calls directly influencing allocator (except for UPDATE_FRAMEWORK) return '202 ACCEPTED` at an early stage of processing. Unconditionally changing this might break compatibility with some existing frameworks.

Attachments

Issue Links

is fixed by

MESOS-10056 Perform synchronous authorization for scheduler calls.

Resolved

Activity

People

Assignee:: Andrei Sekretenko

Reporter:: Andrei Sekretenko

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Oct/19 18:05

Updated:: 03/Mar/20 14:27

Resolved:: 03/Mar/20 14:27