Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.5.1
-
None
-
Reviewed
Description
We've noticed that if you run the CapacityScheduler with the DominantResourceCalculator, sometimes apps will end up with containers in a reserved state even though free slots are available.
The root cause seems to be this piece of code from CapacityScheduler.java -
// Try to schedule more if there are no reservations to fulfill if (node.getReservedContainer() == null) { if (Resources.greaterThanOrEqual(calculator, getClusterResource(), node.getAvailableResource(), minimumAllocation)) { if (LOG.isDebugEnabled()) { LOG.debug("Trying to schedule on node: " + node.getNodeName() + ", available: " + node.getAvailableResource()); } root.assignContainers(clusterResource, node, false); } } else { LOG.info("Skipping scheduling since node " + node.getNodeID() + " is reserved by application " + node.getReservedContainer().getContainerId().getApplicationAttemptId() ); }
The code is meant to check if a node has any slots available for containers . Since it uses the greaterThanOrEqual function, we end up in situation where greaterThanOrEqual returns true, even though we may not have enough CPU or memory to actually run the container.