Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.0
-
Reviewed
Description
Error symptoms
It is not possible to modify a queue hierarchy in absolute mode when the parent or every child queue of the parent has 0 min resource configured.
2024-01-05 15:38:59,016 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager: Initialized queue: root.a.c 2024-01-05 15:38:59,016 ERROR org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception thrown when modifying configuration. java.io.IOException: Failed to re-init queues : Parent=root.a: When absolute minResource is used, we must make sure both parent and child all use absolute minResource
Reproduction
capacity-scheduler.xml
<?xml version="1.0"?> <configuration> <property> <name>yarn.scheduler.capacity.root.queues</name> <value>default,a</value> </property> <property> <name>yarn.scheduler.capacity.root.capacity</name> <value>[memory=40960, vcores=16]</value> </property> <property> <name>yarn.scheduler.capacity.root.default.capacity</name> <value>[memory=1024, vcores=1]</value> </property> <property> <name>yarn.scheduler.capacity.root.default.maximum-capacity</name> <value>[memory=1024, vcores=1]</value> </property> <property> <name>yarn.scheduler.capacity.root.a.capacity</name> <value>[memory=0, vcores=0]</value> </property> <property> <name>yarn.scheduler.capacity.root.a.maximum-capacity</name> <value>[memory=39936, vcores=15]</value> </property> <property> <name>yarn.scheduler.capacity.root.a.queues</name> <value>b,c</value> </property> <property> <name>yarn.scheduler.capacity.root.a.b.capacity</name> <value>[memory=0, vcores=0]</value> </property> <property> <name>yarn.scheduler.capacity.root.a.b.maximum-capacity</name> <value>[memory=39936, vcores=15]</value> </property> <property> <name>yarn.scheduler.capacity.root.a.c.capacity</name> <value>[memory=0, vcores=0]</value> </property> <property> <name>yarn.scheduler.capacity.root.a.c.maximum-capacity</name> <value>[memory=39936, vcores=15]</value> </property> </configuration>
updatequeue.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <sched-conf> <update-queue> <queue-name>root.a</queue-name> <params> <entry> <key>capacity</key> <value>[memory=1024,vcores=1]</value> </entry> <entry> <key>maximum-capacity</key> <value>[memory=39936,vcores=15]</value> </entry> </params> </update-queue> </sched-conf>
$ curl -X PUT -H 'Content-Type: application/xml' -d @updatequeue.xml http://localhost:8088/ws/v1/cluster/scheduler-conf\?user.name\=yarn Failed to re-init queues : Parent=root.a: When absolute minResource is used, we must make sure both parent and child all use absolute minResource
Root cause
setChildQueues is called during reinit, where:
void setChildQueues(Collection<CSQueue> childQueues) throws IOException { writeLock.lock(); try { boolean isLegacyQueueMode = queueContext.getConfiguration().isLegacyQueueMode(); if (isLegacyQueueMode) { QueueCapacityType childrenCapacityType = getCapacityConfigurationTypeForQueues(childQueues); QueueCapacityType parentCapacityType = getCapacityConfigurationTypeForQueues(ImmutableList.of(this)); if (childrenCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE || parentCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE) { // We don't allow any mixed absolute + {weight, percentage} between // children and parent if (childrenCapacityType != parentCapacityType && !this.getQueuePath() .equals(CapacitySchedulerConfiguration.ROOT)) { throw new IOException("Parent=" + this.getQueuePath() + ": When absolute minResource is used, we must make sure both " + "parent and child all use absolute minResource"); }
The parent or childrenCapacityType will be considered as PERCENTAGE, because getCapacityConfigurationTypeForQueues fails to detect the absolute mode, here:
if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel) .equals(Resources.none())) { absoluteMinResSet = true;
(It only happens in legacy queue mode.)
Possible fixes
Possible fix in AbstractParentQueue.getCapacityConfigurationTypeForQueues using the capacityVector:
for (CSQueue queue : queues) { for (String nodeLabel : queueCapacities.getExistingNodeLabels()) { Set<QueueCapacityVector.ResourceUnitCapacityType> definedCapacityTypes = queue.getConfiguredCapacityVector(nodeLabel).getDefinedCapacityTypes(); if (definedCapacityTypes.size() == 1) { QueueCapacityVector.ResourceUnitCapacityType next = definedCapacityTypes.iterator().next(); if (Objects.requireNonNull(next) == PERCENTAGE) { percentageIsSet = true; diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", label=").append(nodeLabel) .append(" uses percentage mode}. "); } else if (next == QueueCapacityVector.ResourceUnitCapacityType.ABSOLUTE) { absoluteMinResSet = true; diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", label=").append(nodeLabel) .append(" uses absolute mode}. "); } else if (next == QueueCapacityVector.ResourceUnitCapacityType.WEIGHT) { weightIsSet = true; diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", label=").append(nodeLabel) .append(" uses weight mode}. "); } } else if (definedCapacityTypes.size() > 1) { mixedIsSet = true; diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", label=").append(nodeLabel) .append(" uses mixed mode}. "); } } }
Pre capacityVector, we could utilise checkConfigTypeIsAbsoluteResource, e.g.:
- if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel) - .equals(Resources.none())) { + if (checkConfigTypeIsAbsoluteResource(queue.getQueuePath(), nodeLabel)) {
Attachments
Attachments
Issue Links
- links to