Details
Description
When use w2rRatio compute fair share, there may be a chance triggering the problem of Int overflow, and entering an infinite loop.
Since the compute share thread holds the writeLock, it may blocking scheduling thread.
This issue occurs in a production environment. And we have already fixed it.
added 2018-10-29: elaborate the problem
/**
- Compute the resources that would be used given a weight-to-resource ratio
- w2rRatio, for use in the computeFairShares algorithm as described in #
*/
private static int resourceUsedWithWeightToResourceRatio(double w2rRatio,
Collection<? extends Schedulable> schedulables, String type) { int resourcesTaken = 0; for (Schedulable sched : schedulables) { int share = computeShare(sched, w2rRatio, type); resourcesTaken += share; }return resourcesTaken;
}
The variable resourcesTaken is an integer type. And it also is accumulated value of result of
computeShare(Schedulable sched, double w2rRatio,String type) which is a value between the min share and max share of a queue.
For example, when there are 3 queues, each has min share = max share =
Integer.MAX_VALUE, the resourcesTaken will be out of Integer bound, and it will be a negative number.
when resourceUsedWithWeightToResourceRatio(double w2rRatio, Collection<? extends Schedulable> schedulables, String type) return a negative number, the loop in
computeSharesInternal() may never out which got the scheduler lock.
//org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares
while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
< totalResource)
This may blocking scheduling thread.
Attachments
Attachments
Issue Links
- causes
-
YARN-9173 FairShare calculation broken for large values after YARN-8833
- Resolved
- links to