Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-7

MapReduce has a series of problems concerning task-allocation to worker nodes

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      All

      Description

      The MapReduce JobTracker is not great at allocating tasks to TaskTracker worker nodes.

      Here are the problems:
      1) There is no speculative execution of tasks
      2) Reduce tasks must wait until all map tasks are completed before doing any work
      3) TaskTrackers don't distinguish between Map and Reduce jobs. Also, the number of
      tasks at a single node is limited to some constant. That means you can get weird deadlock
      problems upon machine failure. The reduces take up all the available execution slots, but they
      don't do productive work, because they're waiting for a map task to complete. Of course, that
      map task won't even be started until the reduce tasks finish, so you can see the problem...
      4) The JobTracker is so complicated that it's hard to fix any of these.

      The right solution is a rewrite of the JobTracker to be a lot more flexible in task handling.
      It has to be a lot simpler. One way to make it simpler is to add an abstraction I'll call
      "TaskInProgress". Jobs are broken into chunks called TasksInProgress. All the TaskInProgress
      objects must be complete, somehow, before the Job is complete.

      A single TaskInProgress can be executed by one or more Tasks. TaskTrackers are assigned Tasks.
      If a Task fails, we report it back to the JobTracker, where the TaskInProgress lives. The TIP can then
      decide whether to launch additional Tasks or not.

      Speculative execution is handled within the TIP. It simply launches multiple Tasks in parallel. The
      TaskTrackers have no idea that these Tasks are actually doing the same chunk of work. The TIP
      is complete when any one of its Tasks are complete.

        Attachments

        1. jobtracker.patch
          113 kB
          Mike Cafarella

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              michael_cafarella Mike Cafarella
            • Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: