The MapReduce JobTracker is not great at allocating tasks to TaskTracker worker nodes.
Here are the problems:
1) There is no speculative execution of tasks
2) Reduce tasks must wait until all map tasks are completed before doing any work
3) TaskTrackers don't distinguish between Map and Reduce jobs. Also, the number of
tasks at a single node is limited to some constant. That means you can get weird deadlock
problems upon machine failure. The reduces take up all the available execution slots, but they
don't do productive work, because they're waiting for a map task to complete. Of course, that
map task won't even be started until the reduce tasks finish, so you can see the problem...
4) The JobTracker is so complicated that it's hard to fix any of these.
The right solution is a rewrite of the JobTracker to be a lot more flexible in task handling.
It has to be a lot simpler. One way to make it simpler is to add an abstraction I'll call
"TaskInProgress". Jobs are broken into chunks called TasksInProgress. All the TaskInProgress
objects must be complete, somehow, before the Job is complete.
A single TaskInProgress can be executed by one or more Tasks. TaskTrackers are assigned Tasks.
If a Task fails, we report it back to the JobTracker, where the TaskInProgress lives. The TIP can then
decide whether to launch additional Tasks or not.
Speculative execution is handled within the TIP. It simply launches multiple Tasks in parallel. The
TaskTrackers have no idea that these Tasks are actually doing the same chunk of work. The TIP
is complete when any one of its Tasks are complete.