Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9351

user can't use total resources of one partition even when yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent is set to 100

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.2
    • None
    • capacityscheduler
    • None

    Description

      if we configure queue capacity in absolute term, users can't use total resource of one partition even yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent is set to 100 
      for example there are two partition A,B, partition A has (120G memory,30 vcores), and partition B has (180G memory,60 vcores), and Queue Prod is configured with (75G memory, 25 vcores) partition A resource, like yarn.scheduler.capacity.root.Prod.accessible-node-labels.A.capacity=[memory=75Gi,vcores=25],

      and yarn.scheduler.capacity.root.Prod.accessible-node-labels.A.maximum-capacity=[memory=120Gi,vcores=30]

      yarn.scheduler.capacity.root.Prod.minimum-user-limit-percent=100, and at one point the used resource of queue Prod is (90G memory,10 vcores), at this time even though yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent is set to 100 , users in queue A can't get more resource.

       

      the reason for this is that  when computeUserLimit, partitionResource is used for comparing consumed, queueCapacity, so in the example (75G memory, 25 vcores) is the user limit. 

      Resource currentCapacity = Resources.lessThan(resourceCalculator,
      partitionResource, consumed, queueCapacity)
      ? queueCapacity
      : Resources.add(consumed, required);

      Resource userLimitResource = Resources.max(resourceCalculator, partitionResource,Resources.divideAndCeil(resourceCalculator, resourceUsed,

      usersSummedByWeight),Resources.divideAndCeil(resourceCalculator,Resources.multiplyAndRoundDown(currentCapacity, getUserLimit()),100));

       

      but when canAssignToUser = Resources.greaterThan(resourceCalculator, clusterResource,
      user.getUsed(nodePartition), limit)

      clusterResource is used for for comparing  used and limit, the result is false.

      Attachments

        Issue Links

          Activity

            People

              jutia Juanjuan Tian
              jutia Juanjuan Tian
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: