Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4618

RM Stops allocating containers if large number of pending containers

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In one of the test found that when RM is having so many pending container request to be served RM Stops assigning containers.

      Cluster simulated is with 100 TB

      Root total = 60k containers =
      Queue 1 = 30k containers = 1328800000 MB
      Queue 2 = 30k containers = 1428800000 MB
      Each container request is with 40GB.

      ParentQueue#assignContainers is as below

          // Check if this queue need more resource, simply skip allocation if this
          // queue doesn't need more resources.
          if (!super.hasPendingResourceRequest(node.getPartition(),
              clusterResource, schedulingMode)) {
            if (LOG.isDebugEnabled()) {
              LOG.debug("Skip this queue=" + getQueuePath()
                  + ", because it doesn't need more resource, schedulingMode="
                  + schedulingMode.name() + " node-partition=" + node.getPartition());
            }
            return CSAssignment.NULL_ASSIGNMENT;
          }
      

      When the pending resource > MAX VALUE and become negative - 167XXXXXXX MB and always NULL_ASSIGNMENT is return.

      Tools used to test SLS.

      For checking pendingResource request we should first check any pending containers (from getMetrics()) are there to be served. If pending containers are available then return true else consider other check for increase request.

      Thoughts ??

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              bibinchundatt Bibin Chundatt
              Reporter:
              bibinchundatt Bibin Chundatt

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment