Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.3.0
-
None
Description
Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues related to it https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987
Have found one more bug in the CapacityScheduler.java code which causes the same issue with slight difference in the repro.
Repro:
Nodes : Available : Used
Node1 - 8GB, 8vcores - 8GB. 8cores
Node2 - 8GB, 8vcores - 8GB. 8cores
Node3 - 8GB, 8vcores - 8GB. 8cores
Queues -> A and B both 50% capacity, 100% max capacity
MultiNode enabled + Preemption enabled
1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores
2. JobB Submitted to B queue with AM size of 1GB
2020-05-21 12:12:27,313 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest IP=172.27.160.139 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1590046667304_0005 CALLERCONTEXT=CLI QUEUENAME=dummy
3. Preemption happens and used capacity is lesser than 1.0f
2020-05-21 12:12:48,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics: Non-AM container preempted, current appAttemptId=appattempt_1590046667304_0004_000001, containerId=container_e09_1590046667304_0004_01_000024, resource=<memory:1024, vCores:1>
4. JobB gets a Reserved Container as part of CapacityScheduler#allocateOrReserveNewContainer
2020-05-21 12:12:48,226 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e09_1590046667304_0005_01_000001 Container Transitioned from NEW to RESERVED 2020-05-21 12:12:48,226 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with resource=<memory:1024, vCores:1>
Why RegularContainerAllocator reserved the container when the used capacity is <= 1.0f ?
The reason is even though the container is preempted - nodemanager has to stop the container and heartbeat and update the available and unallocated resources to ResourceManager.
5. Now, no new allocation happens and reserved container stays at reserved.
After reservation the used capacity becomes 1.0f, below will be in a loop and no new allocate or reserve happens. The reserved container cannot be allocated as reserved node does not have space. node2 has space for 1GB, 1vcore but CapacityScheduler#allocateOrReserveNewContainers not getting called causing the Hang.
[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> CapacityScheduler#allocateFromReservedContainer -> Re-reserve the container on node
2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to fulfill reservation for application application_1590046667304_0005 on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041
2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignContainers: partition= #applications=1
2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with resource=<memory:1024, vCores:1>
2020-05-21 12:13:33,243 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Allocation proposal accepted
CapacityScheduler#allocateOrReserveNewContainers won't be called as below check in allocateContainersOnMultiNodes fails
if (getRootQueue().getQueueCapacities().getUsedCapacity(
candidates.getPartition()) >= 1.0f
&& preemptionManager.getKillableResource(
Attachments
Attachments
Issue Links
- is required by
-
YARN-10259 Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement
- Resolved