[YARN-4931] Preempted resources go back to the same application - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.7.2
Fix Version/s: None
Component/s: fairscheduler
Labels:
None

Description

Sometimes a queue that needs resources causes preemption - but the preempted containers are just allocated right back to the application that just released them!

Here is a tiny application (0007) that wants resources, and a container is preempted from application 0002 to satisfy it:

2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Should preempt <memory:448, vCores:0> res for queue root.default: resDueToMinShare = <memory:0, vCores:0>, resDueToFairShare = <memory:448, vCores:0>
2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Preempting container (prio=1res=<memory:15264, vCores:1>) from queue root.milesc
2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics (FairSchedulerUpdateThread): Non-AM container preempted, current appAttemptId=appattempt_1460047303577_0002_000001, containerId=container_1460047303577_0002_01_001038, resource=<memory:15264, vCores:1>
2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container Transitioned from RUNNING to KILLED

But then a moment later, application 00002 gets the container right back:

2016-04-07 21:08:13,844 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode (ResourceManager Event Processor): Assigned container container_1460047303577_0002_01_001039 of capacity <memory:15264, vCores:1> on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers, <memory:241248, vCores:18> used and <memory:416, vCores:46> available after allocation
2016-04-07 21:08:14,555 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container Transitioned from ALLOCATED to ACQUIRED
2016-04-07 21:08:14,845 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1460047303577_0002_01_001039 Container Transitioned from ACQUIRED to RUNNING

This results in new applications being unable to even get an AM, and never starting at all.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

resourcemanager.log
07/Apr/16 23:05
86 kB
Miles Crawford

Issue Links

is part of

YARN-4752 FairScheduler should preempt for a ResourceRequest and all preempted containers should be on the same node

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Miles Crawford

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 07/Apr/16 23:04

Updated:: 21/Aug/18 08:16