[YARN-3997] An Application requesting multiple core containers can't preempt running application made of single core containers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Duplicate
Affects Version/s: 2.7.1
Fix Version/s: None
Component/s: fairscheduler
Labels:
None
Environment:

Ubuntu 14.04, Hadoop 2.7.1, Physical Machines

Target Version/s:

2.9.0

Description

When our cluster is configured with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting containers with a size > 1, for example 4 core containers.

When the "second" application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly.

It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't "smart" enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed...

The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires.

This can easily be replicated with any yarn application.
Our "goto" scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container.

Attachments

Issue Links

is part of

YARN-4752 FairScheduler should preempt for a ResourceRequest and all preempted containers should be on the same node

Resolved

Activity

People

Assignee:: Arun Suresh

Reporter:: Dan Shechter

Votes:: 3 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 30/Jul/15 11:35

Updated:: 31/Aug/16 03:35

Resolved:: 31/Aug/16 03:35