A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this.
The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned
to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity.
By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (
YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569).
The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService.