Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4477

FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: fairscheduler
    • Labels:
      None

      Description

      This problem is introduced by YARN-4270 which add limitation on reservation.
      In FSAppAttempt.reserve():

      if (!reservationExceedsThreshold(node, type)) {
            LOG.info("Making reservation: node=" + node.getNodeName() +
                    " app_id=" + getApplicationId());
            if (!alreadyReserved) {
              getMetrics().reserveResource(getUser(), container.getResource());
              RMContainer rmContainer =
                      super.reserve(node, priority, null, container);
              node.reserveResource(this, priority, rmContainer);
              setReservation(node);
            } else {
              RMContainer rmContainer = node.getReservedContainer();
              super.reserve(node, priority, rmContainer, container);
              node.reserveResource(this, priority, rmContainer);
              setReservation(node);
            }
          }
      

      If reservation over threshod, current node will not set reservation.
      But in attemptScheduling in FairSheduler:

            while (node.getReservedContainer() == null) {
              boolean assignedContainer = false;
              if (!queueMgr.getRootQueue().assignContainer(node).equals(
                  Resources.none())) {
                assignedContainers++;
                assignedContainer = true;
                
              }
              
              if (!assignedContainer) { break; }
              if (!assignMultiple) { break; }
              if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
            }
      

      assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
      equals to Resources.none().
      As a result, if multiple assign is enabled and maxAssign is unlimited, this while loop would never break.

      I suppose that assignContainer(node) should return Resource.none rather than CONTAINER_RESERVED when the attempt doesn't take the reservation because of the limitation.

        Attachments

        1. YARN-4477.001.patch
          2 kB
          Tao Jie
        2. YARN-4477.002.patch
          5 kB
          Tao Jie
        3. YARN-4477.003.patch
          5 kB
          Tao Jie
        4. YARN-4477.004.patch
          5 kB
          Tao Jie

          Issue Links

            Activity

              People

              • Assignee:
                Tao Jie Tao Jie
                Reporter:
                Tao Jie Tao Jie
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: