[YARN-4148] When killing app, RM releases app's resource before they are released by NM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8.0, 2.9.0, 3.0.0-alpha2
Component/s: resourcemanager
Labels:
None

Target Version/s:

2.8.0
Hadoop Flags:

Reviewed

Description

When killing a app, RM scheduler releases app's resource as soon as possible, then it might allocate these resource for new requests. But NM have not released them at that time.

The problem was found when we supported GPU as a resource(~~YARN-4122~~). Test environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. But when B tried to start container on NM, NM found it didn't have 3 GPUs to allocate because it had not released A's GPUs.

I think the problem also exists for CPU/Memory. It might cause OOM when memory is overused.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-4148-branch-2.8.003.patch
10/Jan/17 22:06
22 kB
Jason Darrell Lowe
YARN-4148.wip.patch
15/Sep/15 08:43
18 kB
Jun Gong
YARN-4148.003.patch
12/Dec/16 21:09
21 kB
Jason Darrell Lowe
YARN-4148.002.patch
21/Nov/16 22:14
21 kB
Jason Darrell Lowe
YARN-4148.001.patch
16/Sep/15 16:54
21 kB
Jun Gong
free_in_scheduler_but_not_node_prototype-branch-2.7.patch
23/Jun/16 15:07
13 kB
Jason Darrell Lowe

Issue Links

is duplicated by

YARN-5290 ResourceManager can place more containers on a node than the node size allows

Resolved

relates to

YARN-5197 RM leaks containers if running container disappears from node update

Closed

Activity

People

Assignee:: Jason Darrell Lowe

Reporter:: Jun Gong

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Dates

Created:: 11/Sep/15 14:04

Updated:: 25/Oct/19 20:26

Resolved:: 11/Jan/17 02:11