The changes look good, but I thought about one other issue: What should we do when we are asked to lower the slots on a node to below the number of running tasks on it? In the current version, the scheduler won't launch tasks on that node until its running task count falls below its slot count. However, if we wanted to use this for rollover, we'd probably want to wait until enough of those tasks are done before giving a slot to the new JobTracker. There are two ways we can do this: Either have the process that's scaling down the cluster watch the running tasks before giving the slots to someone else, or include an API that somehow makes a callback when the number of running tasks has decreased below the target slot count. What are your thoughts on this?
One other thing we may want to support is killing tasks after a timeout if the cluster hasn't scaled down. However, I think this can already be done through the MRAdmin shell command / API.
In either case, we probably need some API to see what's running on the cluster. Some of the commands in MRAdmin might be enough, but we may want to add something there. However, this can be a different JIRA.