Hadoop Common
  1. Hadoop Common
  2. HADOOP-3136

Assign multiple tasks per TaskTracker heartbeat

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None

      Description

      In today's logic of finding a new task, we assign only one task per heartbeat.

      We probably could give the tasktracker multiple tasks subject to the max number of free slots it has - for maps we could assign it data local tasks. We could probably run some logic to decide what to give it if we run out of data local tasks (e.g., tasks from overloaded racks, tasks that have least locality, etc.). In addition to maps, if it has reduce slots free, we could give it reduce task(s) as well. Again for reduces we could probably run some logic to give more tasks to nodes that are closer to nodes running most maps (assuming data generated is proportional to the number of maps). For e.g., if rack1 has 70% of the input splits, and we know that most maps are data/rack local, we try to schedule ~70% of the reducers there.

      Thoughts?

      1. HADOOP-3136_0_20080805.patch
        8 kB
        Arun C Murthy
      2. HADOOP-3136_1_20080809.patch
        7 kB
        Arun C Murthy
      3. HADOOP-3136_2_20080911.patch
        11 kB
        Arun C Murthy
      4. HADOOP-3136_3_20081211.patch
        24 kB
        Arun C Murthy
      5. HADOOP-3136_4_20081212.patch
        41 kB
        Arun C Murthy
      6. HADOOP-3136_5_20081215.patch
        45 kB
        Arun C Murthy

        Activity

        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Arun C Murthy made changes -
        Resolution Fixed [ 1 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Arun C Murthy made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Fix Version/s 0.20.0 [ 12313438 ]
        Arun C Murthy made changes -
        Attachment HADOOP-3136_5_20081215.patch [ 12396139 ]
        Nigel Daley made changes -
        Fix Version/s 0.20.0 [ 12313438 ]
        Arun C Murthy made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Arun C Murthy made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Arun C Murthy made changes -
        Attachment HADOOP-3136_4_20081212.patch [ 12395984 ]
        Arun C Murthy made changes -
        Attachment HADOOP-3136_3_20081211.patch [ 12395925 ]
        Arun C Murthy made changes -
        Fix Version/s 0.20.0 [ 12313438 ]
        Robert Chansler made changes -
        Fix Version/s 0.19.0 [ 12313211 ]
        Arun C Murthy made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Arun C Murthy made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Arun C Murthy made changes -
        Attachment HADOOP-3136_2_20080911.patch [ 12389984 ]
        Arun C Murthy made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Arun C Murthy made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Arun C Murthy made changes -
        Attachment HADOOP-3136_1_20080809.patch [ 12387886 ]
        Arun C Murthy made changes -
        Attachment HADOOP-3136_0_20080805.patch [ 12387548 ]
        Arun C Murthy made changes -
        Fix Version/s 0.19.0 [ 12313211 ]
        Arun C Murthy made changes -
        Assignee Arun C Murthy [ acmurthy ]
        Mukund Madhugiri made changes -
        Fix Version/s 0.18.0 [ 12312972 ]
        Devaraj Das made changes -
        Field Original Value New Value
        Description In today's logic of finding a new task, we assign only one task per heartbeat.

        We probably could give the tasktracker multiple tasks subject to the max number of free slots it has - for maps we could assign it data local tasks. We could probably run some logic to decide what to give it if we run out of data local tasks (e.g., tasks from overloaded racks, tasks that have least locality, etc.). In addition to maps, if it has reduce slots free, we could give it reduce task(s) as well. Again for reduces we could probably run some logic to give more tasks to nodes that are closer to nodes running most maps (assuming data generated is proportional to the number of maps). For e.g., if rack1 has 70% of the input splits, we try to schedule ~70% of the reducers there.

        Thoughts?
        In today's logic of finding a new task, we assign only one task per heartbeat.

        We probably could give the tasktracker multiple tasks subject to the max number of free slots it has - for maps we could assign it data local tasks. We could probably run some logic to decide what to give it if we run out of data local tasks (e.g., tasks from overloaded racks, tasks that have least locality, etc.). In addition to maps, if it has reduce slots free, we could give it reduce task(s) as well. Again for reduces we could probably run some logic to give more tasks to nodes that are closer to nodes running most maps (assuming data generated is proportional to the number of maps). For e.g., if rack1 has 70% of the input splits, and we know that most maps are data/rack local, we try to schedule ~70% of the reducers there.

        Thoughts?
        Devaraj Das created issue -

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Devaraj Das
          • Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development