[YARN-11194] FairShare preemption doesn't enforce fairness between sibling in some cases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.2.1
Fix Version/s: None
Component/s: fairscheduler, scheduler preemption
Labels:
None
Environment:

hadoop yarn 3.2.1

Description

Queues hierarchy:

root (cluster: 1000GB, 1000 vcores)

q1 (maxResources: 10GB, 10 vcores)
- q1.1 (weight: 1)
- q1.2 (weight: 9)
q2
q3

Steps:

app1 with a demand 100GB/100 vcores is added to q1.1 => it gets 10GB/10 vcores
1. q1 reaches it's max
app2 with a demand 1000GB/1000 vcores is added to q2 => it gets 990GB/990 vcores
1. cluster runs at 100% capacity now
app3 with demand 100GB/100 vcores is added to q1.2 => ...

Expected behavior: fair share preemption preempts containers from app1 (q1.1) so app3 (q1.2) gets 9GB/9 vcores according to the weight.

Observed behavior: app3 is starving

Some observations:

We see some preemption happening from app2 (q2) that matches app3 starvation (9GB/9 vcores in this case). It may suggest app2 preempts from app3 but can't use preempted containers due to this check. Also if a container for preemption is random, it way more likely to be preempted from app2 compared to app1 due to allocation size.
Eliminating max on q1 helps to resolve the issue but we need to keep the max

Notes:

this is oversimplified version of our production set up. I can provide more details if needed.
I have a heap dump of the issue that I can't share due because of our policy, but I can look up some state if needed.
My co-worker reported a bug for the same issue YARN-11171, please feel free to close it as a duplicate.

Thanks!

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Dmitry

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 23/Jun/22 01:18

Updated:: 23/Jun/22 03:56