The allocation logic has grown organically and is now very hard to read and maintain. This epic will track cleanups to improve the readability of the core allocation logic:
- Add a function for returning the subset of frameworks that are capable of receiving offers from the agent. This moves the capability checking out of the core allocation logic and means the loops can just iterate over a smaller set of framework candidates rather than having to write 'continue' cases. This covers the GPU_RESOURCES and REGION_AWARE capabilities.
- Similarly, add a function that allows framework capability based filtering of resources. This pulls out the filtering logic from the core allocation logic and instead the core allocation logic can just all out to the capability filtering function. This covers the SHARED_RESOURCES, REVOCABLE_RESOURCES and RESERVATION_REFINEMENT capabilities. Note that in order to implement this one, we must refactor the shared resources logic in order to have the resource generation occur regardless of the framework capability (followed by getting filtered out via this new function if the framework is not capable).
- Update the scalar quantity related functions to also strip static reservation metadata. Currently there is extra code in the allocator across many places (including the allocation logic) to perform this in the call-sites.
- Track across allocation cycles or pull out the following into functions: quantity of quota that is currently "charged" to a role, amount of "headroom" that is needed/available for unsatisfied quota guarantees.
- Pull out the resource shrinking function.