[MESOS-6904] Perform batching of allocations to reduce allocator queue backlogging. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: allocation
Labels:
- allocator

Target Version/s:

1.2.0
Epic Link:
Allocator Performance

Description

Per ~~MESOS-3157~~:

Our deployment environments have a lot of churn, with many short-live frameworks that often revive offers. Running the allocator takes a long time (from seconds up to minutes).

In this situation, event-triggered allocation causes the event queue in the allocator process to get very long, and the allocator effectively becomes unresponsive (eg. a revive offers message takes too long to come to the head of the queue).

To remedy the above scenario, it is proposed to perform batching of the enqueued allocation operations so that a single allocation operation can satisfy N enqueued allocations. This should reduce the potential for backlogging in the allocator. See the discussion here in ~~MESOS-3157~~.

Attachments

Issue Links

relates to

MESOS-3078 Recovered resources are not re-allocated until the next allocation delay.

Reviewable

supercedes

MESOS-3157 Only perform periodic resource allocations.

Resolved

Activity

People

Assignee:: Jacob Janco

Reporter:: Jacob Janco

Shepherd:: Yan Xu

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Jan/17 03:23

Updated:: 06/Feb/17 08:53

Resolved:: 01/Feb/17 21:15