FairScheduler has hierarchical queues, but fair share calculation and
preemption still works withing a limited range and effectively still nonhierarchical.
This patch solves this incompleteness in two aspects:
1. Currently MinShare is not propagated to upper queue, that leads to
fair share calculation ignores all Min Shares in deeper queues.
Lets take an example
(implemented as test case TestFairScheduler#testMinShareInHierarchicalQueues)
Then bigApp started within queue1.big with 10x1GB containers.
That effectively eats all maximum allowed resources for queue1.
Subsequent requests for app1 (queue1.sub1.sub11) and
app2 (queue1.sub2) (5x1GB each) will wait for free resources.
Take a note, that sub11 has min share requirements for 6x1GB.
Without given patch fair share will be calculated with no knowledge
about min share requirements and app1 and app2 will get equal
number of containers.
With the patch resources will split according to min share ( in test
it will be 5 for app1 and 1 for app2)
That behaviour controlled by the same parameter as ‘globalPreemtion’,
but that can be changed easily.
Implementation is a bit awkward, but seems that method for min share
recalculation can be exposed as public or protected api and constructor
in FSQueue can call it before using minShare getter. But right now
current implementation with nulls should work too.
2. Preemption doesn’t works between queues on different level for the
queues hierarchy. Moreover, it is not possible to override various
parameters for children queues.
This patch adds parameter ‘globalPreemption’, which enables global
preemption algorithm modifications.
In a nutshell patch adds function shouldAttemptPreemption(queue),
which can calculate usage for nested queues, and if queue with usage more
that specified threshold is found, preemption can be triggered.
Aggregated minShare does the rest of work and preemption will work
as expected within hierarchy of queues with different MinShare/MaxShare
specifications on different levels.
Test case TestFairScheduler#testGlobalPreemption depicts how it works.
One big app gets resources above its fair share and app1 has a declared
min share. On submission code finds that starvation and preempts enough
containers to give enough room for app1.