Affects Version/s: None
Fix Version/s: None
Component/s: capacity scheduler
This is 5 node cluster with total 15GB capacity.
1) Configure Capacity scheduler and set max cluster priority=10
2) launch app1 with no priority and wait for it to occupy full cluster
application_1558135983180_0001 is launched with Priority=0
3) launch app2 with priority=2 and check its in ACCEPTED state
application_1558135983180_0002 is launched with Priority=2
4) launch app3 with priority=3 and check its in ACCEPTED state
application_1558135983180_0003 is launched with Priority=2
5) kill container from app1
6) Verify app3 with higher priority goes to RUNNING state.
When max-application-master-percentage is set to 0.1, app2 goes to RUNNING state even though app3 has higher priority.
In CS LeafQueue, there's two ordering list:
If the queue's total application master usage below maxAMResourcePerQueuePercent, the app will be added to the "ordering-policy" list.
Otherwise, the app will be added to the "pending-ordering-policy" list.
During allocation, only apps in "ordering-policy" are considered.
If there's any app finish, or queue config changed, or node add/remove happen, "pending-ordering-policy" will be reconsidered, and some apps from "pending-ordering-policy" will be added to "ordering-policy".
This behavior leads to the issue of this JIRA:
The cluster has 15GB resource, the max-application-master-percentage is set to 0.1. So it can at most accept 2GB (rounded by 1GB) AM resource, which equals to 2 applications.
When app2 submitted, it will be added to ordering-policy.
When app3 submitted, it will be added to pending-ordering-policy.
When we kill app1, it won't finish immediately. Instead, it will still be part of "odering-policy" until all containers of app1 released. (That makes app3 stays in pending-ordering-policy).
So any resource released by app1, app3 cannot pick up.