The current allocation strategy is to make "coarse-grained" offers to the frameworks, wherein each offer will contain all of the resources currently available on the agent to the framework.
However, this "coarse-grained" invariant does not apply over time as resources are freed and additional offers can be made, since we make another "coarse-grained" offer without rescinding any existing outstanding offers.
This leads fragmentation of the offers for an agent (i.e. it is possible for there to be multiple offers to one or more frameworks for the available resources on an agent). There are a number of issues with this:
(1) In the case where the fragmented offers have been sent to multiple frameworks, it's possible for none of the frameworks to have sufficient resources to run anything. As the schedulers decline or hold on to these offers, it may take a long time to make progress.
(2) A simple scheduler may be implemented to only operate without holding and merging offers since this is more complex (e.g. how long to hold on to offers? more complex offer management / matching). In this case there are some pathological cases where the framework might not receive the single un-fragmented offer (when each time the allocator makes an offer, it sees an outstanding offer already as the DECLINE has not yet been processed).
The suggestion in this ticket is to explore imposing the "coarse-grained" invariant by avoiding fragmenting the offers across multiple frameworks and even for the same framework (we should look at these somewhat separately). This can be achieved if the allocator has visibility into the offers and rescinds outstanding offers for the agent prior to offering additionally freed resources on the agent.
Note however, that this also has some negative implications for scheduling throughput. Consider the case where there is a high degree of churn on an agent due to a large number of small, short-lived tasks. In this case, the framework would experience a lot of scheduling interference as it tries to accept offers but the offers are rescinded frequently as the allocator attempts to un-fragment the offers. There may be ways to mitigate this, if we had a mechanism for "swapping" an offer to a scheduler, then we could allow operations that were sent before the scheduler saw the offer be swapped with more resources. We would have to try to stick to the same scheduler for an agent so that we swap the offer for a single scheduler in favor of rescinding from one scheduler and sending a new offer to a different scheduler. It may be that different frameworks desire different behavior here.
This problem should also be examined in the context of optimistic resource allocation.