Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
We currently run into issues with the DRF scheduler that frameworks do not receive offers (see https://github.com/mesosphere/marathon/issues/1931 for details).
Imagine that we have 10 frameworks and unallocated resources from a single slave.
Allocation interval is 1 sec, and refuse_seconds (i.e. the time for which a declined resource is filtered) is 3 sec across all frameworks.
Allocator offers resources to framework 1 (according to DRF) which declines the offer immediately.
In the next allocation interval framework 1 is skipped due to the declined offer before. Hence the next framework 2 is offered the resources, which it also declines.
The same procedure in the next allocation interval (with framework 3).
In the next allocation interval the refuse_seconds for framework 1 are over, and as it still has the lowest DRF share it gets the resource offered again, which it again declines. And the cycle begins again....
Framework 4 (which is actually waiting for this resource) is never offered this resource.
Attachments
Issue Links
- Is contained by
-
MESOS-1791 Introduce Master / Offer Resource Reservations aka Quota.
- Resolved
- is duplicated by
-
MESOS-2546 Mesos 0.20.1 causes framework starvation on single node clusters when using Chronos and Marathon
- Resolved
-
MESOS-6112 Frameworks are starved when > 5 are run concurrently
- Resolved
- is related to
-
MESOS-1086 DRF allocator should take into account past allocations when determining an ordering so frameworks are not starved.
- Resolved
-
MESOS-1791 Introduce Master / Offer Resource Reservations aka Quota.
- Resolved
-
MESOS-6112 Frameworks are starved when > 5 are run concurrently
- Resolved
-
MESOS-8936 Implement a Random Sorter for offer allocations.
- Resolved
- relates to
-
MESOS-4302 Offer filter timeouts are ignored if the allocator is slow or backlogged.
- Resolved
- links to