[MESOS-8935] Quota limit "chopping" can lead to cpu-only and memory-only offers. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.2, 1.5.0, 1.5.1, 1.6.0
Fix Version/s: 1.4.2, 1.5.2, 1.6.1, 1.7.0
Component/s: allocation
Labels:
None

Sprint:
Mesosphere Sprint 2018-23
Story Points:
3

Description

When we allocate resources to a role, we'll "chop" the available resources of the agent up to the quota limit for the role (per ~~MESOS-7099~~). This prevents the role from exceeding its quota limit.

This has the unintended consequence of creating cpu-only and memory-only offers.

Consider agents with 10 cpus and 100 GB mem and roles with quota guarantee/limit of 5 cpus, 10 GB mem. The following allocations will occur:

agent 1:
r1 -> 5 cpus 10GB mem
r2 -> 5 cpus 10GB mem
r3 -> 0 cpus 10GB mem (quota allocates even if it can make progress towards a single resource and ~~MESOS-1688~~ allows this)
r4 -> 0 cpus 10GB mem
...
r10 -> 0 cpus 10GB mem

agent 2:
r3 -> 5 cpus 0GB mem (r3 is already at its 10GB mem limit)
r4 -> 5 cpus 0GB mem
r11 -> 0 cpus 10GB mem
...
r20 -> 0 cpus 10GB mem

Here, roles 3-20 receive memory only and cpu only offers. This gets further exacerbated if DRF chooses the same ordering between roles across cycles.

Attachments

Issue Links

blocks

MESOS-8626 The 'allocatable' check in the allocator is problematic with multi-role frameworks

Resolved

is related to

MESOS-8939 Allow frameworks to specify fine-grained demand to Mesos

Open

MESOS-8938 Improve the allocator so that it does not send cpu only or mem only resources in certain cases

Open

relates to

MESOS-8936 Implement a Random Sorter for offer allocations.

Resolved

supercedes

MESOS-8626 The 'allocatable' check in the allocator is problematic with multi-role frameworks

Resolved

Activity

People

Assignee:: Meng Zhu

Reporter:: Benjamin Mahler

Shepherd:: Greg Mann

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 18/May/18 23:00

Updated:: 02/Jul/18 23:26

Resolved:: 21/Jun/18 01:01