Issue Details (XML | Word | Printable)

Key: MAPREDUCE-93
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Devaraj Das
Reporter: Runping Qi
Votes: 0
Watchers: 8
Operations

If you were logged in you would be able to see more operations.
Hadoop Map/Reduce

Job Tracker should prefer input-splits from overloaded racks

Created: 09/Oct/07 02:08 PM   Updated: 20/Jun/09 07:50 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

Issue Links:
Reference
 


 Description  « Hide
Currently, when the Job Tracker assigns a mapper task to a task tracker and there is no local split to the task tracker, the
job tracker will find the first runable task in the mast task list and assign the task to the task tracker.
The split for the task is not local to the task tracker, of course. However, the split may be local to other task trackers.
Assigning the that task, to that task tracker may decrease the potential number of mapper attempts with data locality.
The desired behavior in this situation is to choose a task whose split is not local to any task tracker.
Resort to the current behavior only if no such task is found.

In general, it will be useful to know the number of task trackers to which each split is local.
To assign a task to a task tracker, the job tracker should first try to pick a task that is local to the task tracker and that has minimal number of task trackers to which it is local. If no task is local to the task tracker, the job tracker should try to pick a task that has minimal number of task trackers to which it is local.

It is worthwhile to instrument the job tracker code to report the number of splits that are local to some task trackers.
That should be the maximum number of tasks with data locality. By comparing that number with the the actual number of
data local mappers launched, we can know the effectiveness of the job tracker scheduling.

When we introduce rack locality, we should apply the same principle.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Runping Qi made changes - 09/Oct/07 04:41 PM
Field Original Value New Value
Component/s mapred [ 12310690 ]
Description
Currently, when the Job Tracker assigns a mapper task to a task tracker and there is no local split to the task tracker, the
job tracker will find the first runable task in the mast task list and assign the task to the task tracker.
The split for the task is not local to the task tracker, of course. However, the split may be local to other task trackers.
Assigning the that task, to that task tracker may decrease the potential number of mapper attempts with data locality.
The desired behavior in this situation is to choose a task whose split is not local to any task tracker.
Resort to the current behavior only if no such task is found.

In general, it will be useful to know the number of task trackers to which each split is local.
To assign a task to a task tracker, the job tracker should first try to pick a task that is local to the task tracker and that has minimal number of task trackers to which it is local. If no task is local to the task tracker, the job tracker should try to pick a task that has minimal number of task trackers to which it is local.

It is worthwhile to instrument the job tracker code to report the number of splits that are local to some task trackers.
That should be the maximum number of tasks with data locality. By comparing that number with the the actual number of
data local mappers launched, we can know the effectiveness of the job tracker scheduling.

When we introduce rack locality, we should apply the same principle.

Currently, when the Job Tracker assigns a mapper task to a task tracker and there is no local split to the task tracker, the
job tracker will find the first runable task in the mast task list and assign the task to the task tracker.
The split for the task is not local to the task tracker, of course. However, the split may be local to other task trackers.
Assigning the that task, to that task tracker may decrease the potential number of mapper attempts with data locality.
The desired behavior in this situation is to choose a task whose split is not local to any task tracker.
Resort to the current behavior only if no such task is found.

In general, it will be useful to know the number of task trackers to which each split is local.
To assign a task to a task tracker, the job tracker should first try to pick a task that is local to the task tracker and that has minimal number of task trackers to which it is local. If no task is local to the task tracker, the job tracker should try to pick a task that has minimal number of task trackers to which it is local.

It is worthwhile to instrument the job tracker code to report the number of splits that are local to some task trackers.
That should be the maximum number of tasks with data locality. By comparing that number with the the actual number of
data local mappers launched, we can know the effectiveness of the job tracker scheduling.

When we introduce rack locality, we should apply the same principle.

Runping Qi made changes - 08/Jan/08 10:18 PM
Assignee Devaraj Das [ devaraj ]
eric baldeschwieler made changes - 11/Jan/08 10:34 PM
Link This issue is blocked by HADOOP-2560 [ HADOOP-2560 ]
eric baldeschwieler made changes - 11/Jan/08 10:34 PM
Link This issue is blocked by HADOOP-2560 [ HADOOP-2560 ]
eric baldeschwieler made changes - 11/Jan/08 10:35 PM
Link This issue relates to HADOOP-2560 [ HADOOP-2560 ]
Owen O'Malley made changes - 07/Feb/08 09:58 PM
Summary Job Tracker should not clobber the data locality of tasks Job Tracker should prefer input-splits from overloaded racks
Sameer Paranjpye made changes - 08/Feb/08 10:21 AM
Link This issue relates to HADOOP-2119 [ HADOOP-2119 ]
Owen O'Malley made changes - 20/Jun/09 07:50 AM
Component/s mapred [ 12310690 ]
Key HADOOP-2014 MAPREDUCE-93
Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]