MESOS-9802, we removed the quota role sorter which is tech debt.
However, this slows down the allocator. The problem is that in the first stage, even though a cluster might have no active roles with non-default quota, the allocator will now have to sort and go through each and every role in the cluster. Benchmark result shows that for 1k roles with 2k frameworks, the allocator could experience ~50% performance degradation.
There are a couple of ways to address this issue. For example, we could make the sorter aware of quota. And add a method, say `sortQuotaRoles`, to return all the roles with non-default quota. Alternatively, an even better approach would be to deprecate the sorter concept and just have two standalone functions e.g. sortRoles() and sortQuotaRoles() that takes in the role tree structure (not yet exist in the allocator) and return the sorted roles.
In addition, when implementing
MESOS-8068, we need to do more during the allocation cycle. In particular, we need to call shrink many more times than before. These all contribute to the performance slowdown. Specifically, for the quota oriented benchmark `HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2` we can observe 2-3x slowdown compared to the previous release (1.8.1):
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 32.051382735secs
Made 0 allocation in 27.976022773secs
Made 3500 allocations in 13.810811063secs
Made 0 allocation in 9.885972984secs