I've looked at this, it's now in the git repo as feature/
I can see that it works, but not very well, and may not work if there is >1 role trying to be placed.
The blacklist isn't per priority, it's for every role: you can't request >1 role type at the same time. Specifically,
if I was trying to place role "hbase worker" and had blacklisted all but one node, a request for hbase master may pick up
that same blacklist, and not get placed.
Even if it was per-role, we can't stop >1 node being allocated on the same container.
What I do like is the brute force "reject all non-affine allocations" in onContainerAllocated(). It's inefficient, but, provided we are allocated
containers on different nodes, will succeed.
Two problems with it
- there are no guaranteed in the scheduler that you don't get the same one back; that blacklisting
- premption means that being given those containers that are then releases is very expensive: other
people's work is lost.
Which makes me realise that yes, you do need to use the blacklist —as your code does.
Anyway, I don't see that we can do it as is. Without YARN helping, the only way that
we can be sure things work is if we ask for exactly one instance of one role at a time.
The algorithm would be:
loop through each role
for each role with requests to make:
blacklist all nodes which are either failed or hosting an instance already
request exactly one new node
wait for that allocation before continuing to ask for another instance or the next role
It would be slow, indeed, if container requests could not be satisfied for one role, then all other roles could block. But it would ensure that allocated nodes would be anti-affine.