[YARN-10293] Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259) - ASF JIRA

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.4.0
Component/s: capacity scheduler
Labels:
None

Target Version/s:

3.4.0

Description

Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement. ~~YARN-10259~~ has fixed two issues related to it https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987

Have found one more bug in the CapacityScheduler.java code which causes the same issue with slight difference in the repro.

Repro:

Nodes : Available : Used
Node1 - 8GB, 8vcores - 8GB. 8cores
Node2 - 8GB, 8vcores - 8GB. 8cores
Node3 - 8GB, 8vcores - 8GB. 8cores

Queues -> A and B both 50% capacity, 100% max capacity

MultiNode enabled + Preemption enabled

1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores

2. JobB Submitted to B queue with AM size of 1GB

2020-05-21 12:12:27,313 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest  IP=172.27.160.139       OPERATION=Submit Application Request    TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1590046667304_0005    CALLERCONTEXT=CLI       QUEUENAME=dummy

3. Preemption happens and used capacity is lesser than 1.0f

2020-05-21 12:12:48,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics: Non-AM container preempted, current appAttemptId=appattempt_1590046667304_0004_000001, containerId=container_e09_1590046667304_0004_01_000024, resource=<memory:1024, vCores:1>

4. JobB gets a Reserved Container as part of CapacityScheduler#allocateOrReserveNewContainer

2020-05-21 12:12:48,226 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e09_1590046667304_0005_01_000001 Container Transitioned from NEW to RESERVED
2020-05-21 12:12:48,226 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with resource=<memory:1024, vCores:1>

Why RegularContainerAllocator reserved the container when the used capacity is <= 1.0f ?

The reason is even though the container is preempted - nodemanager has to stop the container and heartbeat and update the available and unallocated resources to ResourceManager.

5. Now, no new allocation happens and reserved container stays at reserved.

After reservation the used capacity becomes 1.0f, below will be in a loop and no new allocate or reserve happens. The reserved container cannot be allocated as reserved node does not have space. node2 has space for 1GB, 1vcore but CapacityScheduler#allocateOrReserveNewContainers not getting called causing the Hang.

[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> CapacityScheduler#allocateFromReservedContainer -> Re-reserve the container on node

2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to fulfill reservation for application application_1590046667304_0005 on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041
2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignContainers: partition= #applications=1
2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with resource=<memory:1024, vCores:1>
2020-05-21 12:13:33,243 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Allocation proposal accepted

CapacityScheduler#allocateOrReserveNewContainers won't be called as below check in allocateContainersOnMultiNodes fails

 if (getRootQueue().getQueueCapacities().getUsedCapacity(
        candidates.getPartition()) >= 1.0f
        && preemptionManager.getKillableResource(

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-10293-005.patch
08/Jun/20 11:11
18 kB
Prabhu Joseph
YARN-10293-004.patch
05/Jun/20 10:40
19 kB
Prabhu Joseph
YARN-10293-003-WIP.patch
02/Jun/20 11:12
19 kB
Prabhu Joseph
YARN-10293-002.patch
29/May/20 10:47
17 kB
Prabhu Joseph
YARN-10293-001.patch
28/May/20 11:42
18 kB
Prabhu Joseph

Issue Links

is required by

YARN-10259 Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

Resolved

Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates