Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Our deployment environments have a lot of churn, with many short-live frameworks that often revive offers. Running the allocator takes a long time (from seconds up to minutes).
In this situation, event-triggered allocation causes the event queue in the allocator process to get very long, and the allocator effectively becomes unresponsive (eg. a revive offers message takes too long to come to the head of the queue).
We have been running a patch to remove all the event-triggered allocations and only allocate periodically on the allocation interval. This works great and really improves responsiveness.
Attachments
Issue Links
- contains
-
MESOS-2285 Eliminate dependency on master::Flags in Allocator
- Resolved
- duplicates
-
MESOS-4766 Improve allocator performance.
- Resolved
-
MESOS-4767 Apply batching to allocation events to reduce allocator backlogging.
- Resolved
- is related to
-
MESOS-4102 Quota doesn't allocate resources on slave joining.
- Resolved
-
MESOS-3353 generic mechanism to smuggle allocation options
- Open
-
MESOS-4694 DRFAllocator takes very long to allocate resources with a large number of frameworks
- Resolved
- is superceded by
-
MESOS-6904 Perform batching of allocations to reduce allocator queue backlogging.
- Resolved
- relates to
-
MESOS-3078 Recovered resources are not re-allocated until the next allocation delay.
- Reviewable