Wangda Tan, Thank you for your review, and sorry for the late reply.
- Why this is needed? MAX_PENDING_OVER_CAPACITY. I think this could be problematic, for example, if a queue has capacity = 50, and it's usage is 10 and it has 45 pending resource, if we set MAX_PENDING_OVER_CAPACITY=0.1, the queue cannot preempt resource from other queue.
Sorry for the poor naming convention. It is not really being used to check against the queue's capacity, it is used to check for a percentage over the currently used resources. I changed the name to MAX_PENDING_OVER_CURRENT.
As you know, there are multiple reasons why preemption could unnecessarily preempt resources (I call it "flapping"). Only one of which is the lack of consideration for user limit factor. Another is that an app could be requesting an 8-gig container, and the preemption monitor could conceivably preempt 8, one-gig containers, which would then be rejected by the requesting AM and potentially given right back to the preempted app.
The MAX_PENDING_OVER_CURRENT buffer is an attempt to alleviate that particular flapping situation by giving a buffer zone above the currently used resources on a particular queue. This is to say that the preemption monitor shouldn't consider that queue B is asking for pending resources unless pending resources on queue B are above a configured percentage of currently used resources on queue B.
If you want, we can pull this out and put it as part of a different JIRA so we can document and discuss that particular flapping situation separately.
- n LeafQueue, it uses getHeadroom() to compute how many resource that the user can use. But I think it may not correct: ... For above queue status, headroom for a.a1 is 0 since queue-a's currentResourceLimit is 0.
So instead of using headroom, I think you can use computed-user-limit - user.usage(partition) as the headroom. You don't need to consider queue's max capacity here, since we will consider queue's max capacity at following logic of PCPP.
Yes, you are correct. getHeadroom could be calculating zero headroom when we don't want it to. And, I agree that we don't need to limit pending resources to max queue capacity when calculating pending resources. The concern for this fix is that user limit factor should be considered and limit the pending value. The max queue capacity will be considered during the offer stage of the preemption calculations.