Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2628

Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.5.1
    • 2.6.0
    • capacityscheduler
    • None
    • Reviewed

    Description

      We've noticed that if you run the CapacityScheduler with the DominantResourceCalculator, sometimes apps will end up with containers in a reserved state even though free slots are available.

      The root cause seems to be this piece of code from CapacityScheduler.java -

          // Try to schedule more if there are no reservations to fulfill
          if (node.getReservedContainer() == null) {
            if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
                node.getAvailableResource(), minimumAllocation)) {
              if (LOG.isDebugEnabled()) {
                LOG.debug("Trying to schedule on node: " + node.getNodeName() +
                    ", available: " + node.getAvailableResource());
              }
              root.assignContainers(clusterResource, node, false);
            }
          } else {
            LOG.info("Skipping scheduling since node " + node.getNodeID() + 
                " is reserved by application " + 
                node.getReservedContainer().getContainerId().getApplicationAttemptId()
                );
          }
      

      The code is meant to check if a node has any slots available for containers . Since it uses the greaterThanOrEqual function, we end up in situation where greaterThanOrEqual returns true, even though we may not have enough CPU or memory to actually run the container.

      Attachments

        1. apache-yarn-2628.0.patch
          5 kB
          Varun Vasudev
        2. apache-yarn-2628.1.patch
          6 kB
          Varun Vasudev

        Activity

          People

            vvasudev Varun Vasudev
            vvasudev Varun Vasudev
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: