Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6818

User limit per partition is not honored in branch-2.7 >=

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.4
    • Fix Version/s: 2.7.4
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We are seeing an issue where user limit factor does not cap the amount of resources a user can consume in a queue in a partition. Suppose you have a queue with access to partition X, used resources in default partition is 0, and used resources in partition X is at the partition's user limit. This is the problematic code as far as I can tell: (in LeafQueue.java)

          if (Resources
              .greaterThan(resourceCalculator, clusterResource,
                  user.getUsed(label),
                  limit)) {
            // if enabled, check to see if could we potentially use this node instead
            // of a reserved node if the application has reserved containers
            if (this.reservationsContinueLooking) {
              if (Resources.lessThanOrEqual(
                  resourceCalculator,
                  clusterResource,
                  Resources.subtract(user.getUsed(), application.getCurrentReservation()),
                  limit)) {
      
                if (LOG.isDebugEnabled()) {
                  LOG.debug("User " + userName + " in queue " + getQueueName()
                      + " will exceed limit based on reservations - " + " consumed: "
                      + user.getUsed() + " reserved: "
                      + application.getCurrentReservation() + " limit: " + limit);
                }
                Resource amountNeededToUnreserve = Resources.subtract(user.getUsed(label), limit);
                // we can only acquire a new container if we unreserve first since we ignored the
                // user limit. Choose the max of user limit or what was previously set by max
                // capacity.
                currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
                    clusterResource, currentResoureLimits.getAmountNeededUnreserve(),
                    amountNeededToUnreserve));
                return true;
              }
            }
            if (LOG.isDebugEnabled()) {
              LOG.debug("User " + userName + " in queue " + getQueueName()
                  + " will exceed limit - " + " consumed: "
                  + user.getUsed() + " limit: " + limit);
            }
            return false;
          }
      

      First it sees the used resources in partition X is greater than partition's user limit. Then the reservation check also succeeds because it is checking user.getUsed() - application.getCurrentReservation() <= limit and returns true.

      One fix is to just set Resources.subtract(user.getUsed(), application.getCurrentReservation()) to Resources.subtract(user.getUsed(label), application.getCurrentReservation()).

      This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 introduces this check:

            if (this.reservationsContinueLooking && checkReservations
                && label.equals(CommonNodeLabelsManager.NO_LABEL)) {

      so in this case getting the used resources in default partition seems to be correct.

      1. YARN-6818-branch-2.7.001.patch
        10 kB
        Jonathan Hung
      2. YARN-6818-branch-2.7.002.patch
        10 kB
        Jonathan Hung

        Activity

        Hide
        jhung Jonathan Hung added a comment -

        Great, thanks Konstantin Shvachko for commit and Naganarasimha G R/Sunil G for reviews!

        Show
        jhung Jonathan Hung added a comment - Great, thanks Konstantin Shvachko for commit and Naganarasimha G R / Sunil G for reviews!
        Hide
        shv Konstantin Shvachko added a comment -

        I just committed this to branch-2.7. Thank you Jonathan Hung.

        Show
        shv Konstantin Shvachko added a comment - I just committed this to branch-2.7. Thank you Jonathan Hung .
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Thanks Sunil G for the reviews and Jonathan Hung for working on it.
        +1 LGTM, patch can be committed.

        Show
        Naganarasimha Naganarasimha G R added a comment - Thanks Sunil G for the reviews and Jonathan Hung for working on it. +1 LGTM, patch can be committed.
        Hide
        sunilg Sunil G added a comment -

        Yes. Thanks Jonathan Hung.
        I will wait for Naganarasimha Garla as well.

        Show
        sunilg Sunil G added a comment - Yes. Thanks Jonathan Hung . I will wait for Naganarasimha Garla as well.
        Hide
        jhung Jonathan Hung added a comment -

        Thanks Sunil G for the review! Added 002 patch to pass in NO_LABEL instead of null. Not sure if this was the cleanup in test case setup that you were referring to.

        Show
        jhung Jonathan Hung added a comment - Thanks Sunil G for the review! Added 002 patch to pass in NO_LABEL instead of null. Not sure if this was the cleanup in test case setup that you were referring to.
        Hide
        sunilg Sunil G added a comment -

        Thanks Jonathan Hung. Good catch! A quick comment in test case

        Its better to pass CommonNodeLabelsManager.NO_LABEL or a final string with "" as value instead of null in case of no-label scenario. In CSQueueUtils.loadCapacitiesByLabelsFromConf, I think its been done with an if..else case. It may be better this way for a clean way of setting up test case.

        Show
        sunilg Sunil G added a comment - Thanks Jonathan Hung . Good catch! A quick comment in test case Its better to pass CommonNodeLabelsManager.NO_LABEL or a final string with "" as value instead of null in case of no-label scenario. In CSQueueUtils.loadCapacitiesByLabelsFromConf , I think its been done with an if..else case. It may be better this way for a clean way of setting up test case.
        Hide
        jhung Jonathan Hung added a comment -

        Hi, Naganarasimha G R, attached a patch for branch-2.7.

        Show
        jhung Jonathan Hung added a comment - Hi, Naganarasimha G R , attached a patch for branch-2.7.
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Hi Jonathan Hung,
        Would you mind uploading a patch ?

        Show
        Naganarasimha Naganarasimha G R added a comment - Hi Jonathan Hung , Would you mind uploading a patch ?

          People

          • Assignee:
            jhung Jonathan Hung
            Reporter:
            jhung Jonathan Hung
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development