Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6818

User limit per partition is not honored in branch-2.7 >=

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.4
    • 2.7.4
    • None
    • None
    • Reviewed

    Description

      We are seeing an issue where user limit factor does not cap the amount of resources a user can consume in a queue in a partition. Suppose you have a queue with access to partition X, used resources in default partition is 0, and used resources in partition X is at the partition's user limit. This is the problematic code as far as I can tell: (in LeafQueue.java)

          if (Resources
              .greaterThan(resourceCalculator, clusterResource,
                  user.getUsed(label),
                  limit)) {
            // if enabled, check to see if could we potentially use this node instead
            // of a reserved node if the application has reserved containers
            if (this.reservationsContinueLooking) {
              if (Resources.lessThanOrEqual(
                  resourceCalculator,
                  clusterResource,
                  Resources.subtract(user.getUsed(), application.getCurrentReservation()),
                  limit)) {
      
                if (LOG.isDebugEnabled()) {
                  LOG.debug("User " + userName + " in queue " + getQueueName()
                      + " will exceed limit based on reservations - " + " consumed: "
                      + user.getUsed() + " reserved: "
                      + application.getCurrentReservation() + " limit: " + limit);
                }
                Resource amountNeededToUnreserve = Resources.subtract(user.getUsed(label), limit);
                // we can only acquire a new container if we unreserve first since we ignored the
                // user limit. Choose the max of user limit or what was previously set by max
                // capacity.
                currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
                    clusterResource, currentResoureLimits.getAmountNeededUnreserve(),
                    amountNeededToUnreserve));
                return true;
              }
            }
            if (LOG.isDebugEnabled()) {
              LOG.debug("User " + userName + " in queue " + getQueueName()
                  + " will exceed limit - " + " consumed: "
                  + user.getUsed() + " limit: " + limit);
            }
            return false;
          }
      

      First it sees the used resources in partition X is greater than partition's user limit. Then the reservation check also succeeds because it is checking user.getUsed() - application.getCurrentReservation() <= limit and returns true.

      One fix is to just set Resources.subtract(user.getUsed(), application.getCurrentReservation()) to Resources.subtract(user.getUsed(label), application.getCurrentReservation()).

      This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 introduces this check:

            if (this.reservationsContinueLooking && checkReservations
                && label.equals(CommonNodeLabelsManager.NO_LABEL)) {

      so in this case getting the used resources in default partition seems to be correct.

      Attachments

        1. YARN-6818-branch-2.7.002.patch
          10 kB
          Jonathan Hung
        2. YARN-6818-branch-2.7.001.patch
          10 kB
          Jonathan Hung

        Activity

          People

            jhung Jonathan Hung
            jhung Jonathan Hung
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: