Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example:
1) When allocating container of a parent queue, it will only check parentQueue.usage < parentQueue.max. If leaf queue allocated a container.size > (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example:
A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53)
Queue-A2 is able to allocate container since its usage < max, but if we do that, A's usage can excess A.max.
2) When doing continous reservation check, parent queue will only tell children "you need unreserve some resource, so that I will less than my maximum resource", but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well.
With YARN-3099/YARN-3124, now we have ResourceUsage class in each class, here is my proposal:
- ParentQueue will set its children's ResourceUsage.headroom, which means, maximum resource its children can allocate.
- ParentQueue will set its children's headroom to be (saying parent's name is "qA"): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent).
- needToUnReserve is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit.
- More over, with this,
YARN-3026will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.
Attachments
Attachments
Issue Links
- is related to
-
YARN-3251 Fix CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
- Closed