Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I’ve met that Inter Queue Preemption does not work.
      It happens when DRF is used and submitting application with a large number of vcores.

      IMHO, idealAssigned can be set incorrectly by following code.

      // This function "accepts" all the resources it can (pending) and return
      // the unused ones
      Resource offer(Resource avail, ResourceCalculator rc,
          Resource clusterResource, boolean considersReservedResource) {
        Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
            Resources.subtract(getMax(), idealAssigned),
            Resource.newInstance(0, 0));
        // accepted = min{avail,
        //               max - assigned,
        //               current + pending - assigned,
        //               # Make sure a queue will not get more than max of its
        //               # used/guaranteed, this is to make sure preemption won't
        //               # happen if all active queues are beyond their guaranteed
        //               # This is for leaf queue only.
        //               max(guaranteed, used) - assigned}
        // remain = avail - accepted
        Resource accepted = Resources.min(rc, clusterResource,
            absMaxCapIdealAssignedDelta,
            Resources.min(rc, clusterResource, avail, Resources
                /*
                 * When we're using FifoPreemptionSelector (considerReservedResource
                 * = false).
                 *
                 * We should deduct reserved resource from pending to avoid excessive
                 * preemption:
                 *
                 * For example, if an under-utilized queue has used = reserved = 20.
                 * Preemption policy will try to preempt 20 containers (which is not
                 * satisfied) from different hosts.
                 *
                 * In FifoPreemptionSelector, there's no guarantee that preempted
                 * resource can be used by pending request, so policy will preempt
                 * resources repeatly.
                 */
                .subtract(Resources.add(getUsed(),
                    (considersReservedResource ? pending : pendingDeductReserved)),
                    idealAssigned)));
      

      let’s say,

      • cluster resource : <Memory:200GB, VCores:20>
      • idealAssigned(assigned): <Memory:100GB, VCores:10>
      • avail: <Memory:181GB, Vcores:1>
      • current: <Memory:19GB, Vcores:19>
      • pending: <Memory:0, Vcores:0>

      current + pending - assigned: <Memory:-181GB, Vcores:9>
      min ( avail, (current + pending - assigned) ) : <Memory:-181GB, Vcores:9>
      accepted: <Memory:-181GB, Vcores:9>

      as a result, idealAssigned will be <Memory:-81GB, VCores:19>, which does not trigger preemption.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              kyungwan nam kyungwan nam
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: