Thanks for updating the patch, Kuhu!
I'm confused by the nullAssigment changes, especially in LeafQueue. NULL_ASSIGNMENT was replaced with an explicit object creation but otherwise is the same. NULL_ASSIGNMENT should continue to be used for the early-out cases and normal non-assignments to avoid unnecessary object creation. It will also make the patch substantially smaller. If we're creating one because the blocked resource might be set on it then we should only create the object when we need to set the blocked resource. If we're creating it because the caller might modify the returned assignment then the caller needs to be fixed to make a copy, since a constant assignment object can be returned sometimes.
Does this need to be using Resources.componentwiseMax instead of Resources.max?
Rather than subtracting the blocked resources from each result of getResourceLimitsOfChild, I think it would be better to adjust the parent limits that are already passed to getResourceLimitsOfChild if a child reports blocked resources.
finalBlockedLimits should just be a Resource instead of a ResourceLimit. It only ever uses the Resource within the ResourceLimit in practice, and it is just a Resource total anyway.
Some debug logs when we're asking for blocked resources in the assignment or applying them to parent limits would be helpful for analysis and debugging.
I'm confused why we're checking the headroom to determine the amount of blocked resources. IIRC the headroom is a combination of the user limits and the queue limits. We only want to report blocked resources when we are blocked by the queue limits. If the user cannot make a reservation only due to the user's own limits then we don't want to report any blocked resources. We only want to report resources when we would have either allocated or made a reservation but the queue's limits prevent the full allocation. Then, and only then, we want to report the blocked resources as the amount remaining available in the queue so those resources are reserved relative to other queues until we are able to make the full allocation or reservation.
On a related note, I'm confused on why LeafQueue is subtracting the headroom from the blocked resources. What does this represent? Seems like this could report more blocked resources than the queue has available, which would allow the queue to influence more capacity than its configured max.
With this approach, I think allocations will be skipped for other queues untill this 8GB is served.
If I understand Sunil's question properly, then yes it will block other queues under the parent queue until that 8GB is served, and that is exactly what is needed to solve the problem. Let me restate the scenario to make sure I am understanding it properly. By "One queue is under served and it has a single pending demand for 8GB" then I assume you mean a leaf queue that wants to allocate 8GB, and the leaf queue would normally be able to allocate it but the usage of the parent is such that there's less than 8GB available in the parent. In other words, this is a failure to reserve due to parental limits. In this case, if we fail to block the other sibling queues from allocating their smaller 2GB requests then we have the same type of scenario as in the JIRA description – a higher priority queue that is indefinitely starved by lower priority queues because it can't reserve the remaining resources. So yes, we need the other queues to stop allocating until the higher-priority queue's allocation is satisfied or we have a priority inversion and indefinite postponement issues.