queueMaxAppsDefault sets the default running app limit for queues (including the root queue) which can be overridden by individual child queues through the maxRunningApps setting.
Consider a simple FairScheduler XML as follows:
- queueMaxAppsDefault is set to 3 maxRunningApps by default.
- root queue does not have any maxRunningApps limit set,
- maxRunningApps for child queues - root.A is 15 and for root.B is 10.
From above, if users wants to submit jobs to root.B, they are (incorrectly) capped to 3, not 15 because the root queue (parent) itself is capped to 3 because of the queueMaxAppsDefault setting.
Users' observations are thus seeing their apps stuck in ACCEPTED state.
Either the above FairScheduler XML should have been rejected by the ResourceManager, or, the root queue should have been capped to the maximum maxRunningApps setting defined for a leaf queue.
Possible solution -> If root queue has no maxRunningApps set and queueMaxAppsDefault is set to a lower value than maxRunningApps for an individual leaf queue, then, the root queue should implicitly be capped to the latter, instead of queueMaxAppsDefault.