Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.2
-
None
-
None
Description
if we configure queue capacity in absolute term, users can't use total resource of one partition even yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent is set to 100
for example there are two partition A,B, partition A has (120G memory,30 vcores), and partition B has (180G memory,60 vcores), and Queue Prod is configured with (75G memory, 25 vcores) partition A resource, like yarn.scheduler.capacity.root.Prod.accessible-node-labels.A.capacity=[memory=75Gi,vcores=25],
and yarn.scheduler.capacity.root.Prod.accessible-node-labels.A.maximum-capacity=[memory=120Gi,vcores=30]
yarn.scheduler.capacity.root.Prod.minimum-user-limit-percent=100, and at one point the used resource of queue Prod is (90G memory,10 vcores), at this time even though yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent is set to 100 , users in queue A can't get more resource.
the reason for this is that when computeUserLimit, partitionResource is used for comparing consumed, queueCapacity, so in the example (75G memory, 25 vcores) is the user limit.
Resource currentCapacity = Resources.lessThan(resourceCalculator,
partitionResource, consumed, queueCapacity)
? queueCapacity
: Resources.add(consumed, required);
Resource userLimitResource = Resources.max(resourceCalculator, partitionResource,Resources.divideAndCeil(resourceCalculator, resourceUsed,
usersSummedByWeight),Resources.divideAndCeil(resourceCalculator,Resources.multiplyAndRoundDown(currentCapacity, getUserLimit()),100));
but when canAssignToUser = Resources.greaterThan(resourceCalculator, clusterResource,
user.getUsed(nodePartition), limit)
clusterResource is used for for comparing used and limit, the result is false.
Attachments
Issue Links
- is caused by
-
YARN-5881 [Umbrella] Enable configuration of queue capacity in terms of absolute resources
- Resolved