Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4477

FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • fairscheduler
    • None

    Description

      This problem is introduced by YARN-4270 which add limitation on reservation.
      In FSAppAttempt.reserve():

      if (!reservationExceedsThreshold(node, type)) {
            LOG.info("Making reservation: node=" + node.getNodeName() +
                    " app_id=" + getApplicationId());
            if (!alreadyReserved) {
              getMetrics().reserveResource(getUser(), container.getResource());
              RMContainer rmContainer =
                      super.reserve(node, priority, null, container);
              node.reserveResource(this, priority, rmContainer);
              setReservation(node);
            } else {
              RMContainer rmContainer = node.getReservedContainer();
              super.reserve(node, priority, rmContainer, container);
              node.reserveResource(this, priority, rmContainer);
              setReservation(node);
            }
          }
      

      If reservation over threshod, current node will not set reservation.
      But in attemptScheduling in FairSheduler:

            while (node.getReservedContainer() == null) {
              boolean assignedContainer = false;
              if (!queueMgr.getRootQueue().assignContainer(node).equals(
                  Resources.none())) {
                assignedContainers++;
                assignedContainer = true;
                
              }
              
              if (!assignedContainer) { break; }
              if (!assignMultiple) { break; }
              if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
            }
      

      assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
      equals to Resources.none().
      As a result, if multiple assign is enabled and maxAssign is unlimited, this while loop would never break.

      I suppose that assignContainer(node) should return Resource.none rather than CONTAINER_RESERVED when the attempt doesn't take the reservation because of the limitation.

      Attachments

        1. YARN-4477.001.patch
          2 kB
          Tao Jie
        2. YARN-4477.002.patch
          5 kB
          Tao Jie
        3. YARN-4477.003.patch
          5 kB
          Tao Jie
        4. YARN-4477.004.patch
          5 kB
          Tao Jie

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Tao Jie Tao Jie Assign to me
            Tao Jie Tao Jie
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment