Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8159 [Umbrella] Fixes for Multiple Resource Type Preemption in Capacity Scheduler
  3. YARN-8020

when DRF is used, preemption does not trigger due to incorrect idealAssigned

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      I’ve met that Inter Queue Preemption does not work.
      It happens when DRF is used and submitting application with a large number of vcores.

      IMHO, idealAssigned can be set incorrectly by following code.

      // This function "accepts" all the resources it can (pending) and return
      // the unused ones
      Resource offer(Resource avail, ResourceCalculator rc,
          Resource clusterResource, boolean considersReservedResource) {
        Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
            Resources.subtract(getMax(), idealAssigned),
            Resource.newInstance(0, 0));
        // accepted = min{avail,
        //               max - assigned,
        //               current + pending - assigned,
        //               # Make sure a queue will not get more than max of its
        //               # used/guaranteed, this is to make sure preemption won't
        //               # happen if all active queues are beyond their guaranteed
        //               # This is for leaf queue only.
        //               max(guaranteed, used) - assigned}
        // remain = avail - accepted
        Resource accepted = Resources.min(rc, clusterResource,
            absMaxCapIdealAssignedDelta,
            Resources.min(rc, clusterResource, avail, Resources
                /*
                 * When we're using FifoPreemptionSelector (considerReservedResource
                 * = false).
                 *
                 * We should deduct reserved resource from pending to avoid excessive
                 * preemption:
                 *
                 * For example, if an under-utilized queue has used = reserved = 20.
                 * Preemption policy will try to preempt 20 containers (which is not
                 * satisfied) from different hosts.
                 *
                 * In FifoPreemptionSelector, there's no guarantee that preempted
                 * resource can be used by pending request, so policy will preempt
                 * resources repeatly.
                 */
                .subtract(Resources.add(getUsed(),
                    (considersReservedResource ? pending : pendingDeductReserved)),
                    idealAssigned)));
      

      let’s say,

      • cluster resource : <Memory:200GB, VCores:20>
      • idealAssigned(assigned): <Memory:100GB, VCores:10>
      • avail: <Memory:181GB, Vcores:1>
      • current: <Memory:19GB, Vcores:19>
      • pending: <Memory:0, Vcores:0>

      current + pending - assigned: <Memory:-181GB, Vcores:9>
      min ( avail, (current + pending - assigned) ) : <Memory:-181GB, Vcores:9>
      accepted: <Memory:-181GB, Vcores:9>

      as a result, idealAssigned will be <Memory:-81GB, VCores:19>, which does not trigger preemption.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kyungwan nam kyungwan nam
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: