Here is an approach I have a prelim patch on which I'll put up shortly...
Essentially, we need an extra (boolean?) field per ResourceRequest (RR) called 'blacklist' or, essentially, 'do not fall through'.
What this provides is a way to tell the Scheduler to stop it's current fall-through scheduling mechanism of node-to-rack-ANY ...
When the flag is set, the scheduler should not fall-through and look for rack-locality or ANY.
- if you want to blacklist a node, set the flag on RR of the node
- if you want to blacklist a rack, set the flag on the RR of the rack (obviously do not send any RRs for any nodes in the rack)
- if you want to whitelist a node (or a set of them): blacklist ANY and either don't pass other rack RRs or blacklist all rack RRs (for those nodes you care about).
Obviously, this works across priorities since each RR is per-priority.
Essentially, this flag provides the ability for the application to ask the scheduler to disable it's 'locality relaxation' or 'fall-through'.
This way, we don't radically re-design our protocol in YARN v1 (i.e. hadoop-2.x) and yet provide blacklist/whitelist for individual and a set of nodes/racks.
Clearly, we can make this easier for end-users by adding a simple client api which takes care by providing apis like: blacklistNodes/whitelistNodes and does it automatically for each priority simplifying the application writer's life.
Thoughts? Do folks see any long-term (i.e. hadoop-2.x), negative repercussions due to this?