YARN-4189 does an excellent job describing the issues with the current delay scheduling algorithms within the capacity scheduler. The design proposal also seems like a good direction.
This jira proposes a simple interim solution to the key issue we've been experiencing on a regular basis:
- rackLocal assignments trickle out due to nodeLocalityDelay. This can have significant impact on things like CombineFileInputFormat which targets very specific nodes in its split calculations.
I'm not sure when YARN-4189 will become reality so I thought a simple interim patch might make sense. The basic idea is simple:
1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
2) When we're getting rackLocal assignments, subsequent rackLocal assignments should not be delayed
Patch will be uploaded shortly. No big deal if the consensus is to go straight to YARN-4189.