If you turn delay scheduling off by setting spark.locality.wait=0, you effectively turn off the use the of locality preferences when there is a bulk scheduling event. TaskSchedulerImpl will use resources based on whatever random order it decides to shuffle them, rather than taking advantage of the most local options.
This happens because TaskSchedulerImpl offers resources to a TaskSetManager one at a time, each time subject to a maxLocality constraint. However, that constraint doesn't move through all possible locality levels – it uses tsm.myLocalityLevels . And tsm.myLocalityLevels skips locality levels completely if the wait == 0 . So with delay scheduling off, TaskSchedulerImpl immediately jumps to giving tsms the offers with maxLocality = ANY.
WORKAROUND: instead of setting spark.locality.wait=0, use spark.locality.wait=1ms. The one downside of this is if you have tasks that actually take less than 1ms. You could even run into
SPARK-18886. But that is a relatively unlikely scenario for real workloads.