Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.6.0
-
None
-
None
Description
I met this problem in our cluster, it cause livelock during preemption and scheduling.
Queue hierarchy described as below:
root / | \ queue-1 queue-2 queue-3 / \ queue-1-1 queue-1-2
- Assume cluster resource is 100G in memory
- Assume queue-1 has max resource limit 20G
- queue-1-1 is active and it will get max 20G memory(equal to its fairshare)
- queue-2 is active then, and it require 30G memory(less than its fairshare)
- queue-3 is active, and it can be assigned with all other resources, 50G memory(larger than its fairshare). At here three queues' fair share is (20, 40, 40), and usage is (20, 30, 50)
- queue-1-2 is active, it will cause new preemption request(10G memory and intuitively it can only preempt from its sibling queue-1-1)
- Actually preemption starts from root, and it will find queue-3 is most over fairshare, and preempt some resources form queue-3.
- But during scheduling, it will find queue-1 itself arrived it's max fairshare, and cannot assign resource to it. Then resource's again assigned to queue-3
And then it repeats between last two steps.
Attachments
Issue Links
- duplicates
-
YARN-3405 FairScheduler's preemption cannot happen between sibling in some case
- Resolved