[HADOOP-465] Jobtracker doesn't always spread reduce tasks evenly if (mapred.tasktracker.tasks.maximum > 1) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 0.6.0
Component/s: None
Labels:
None

Description

I note that (at least for Nutch 0.8 Generator.Selector.reduce) if mapred.reduce.tasks is the same as the number of tasktrackers, and mapred.tasktracker.tasks.maximum is left at the default of 2, I typically have no reduce tasks running on a few of my tasktrackers, and two reduce tasks running on the same number of other tasktrackers.

It seems like the jobtracker should assign reduce tasks to tasktrackers in a round robin fashion, so that the distribution will be spread as evenly as possible. The current implementation would seem to waste at least some time if one or more slave machines have to execute two reduce tasks simultaneously while other tasktrackers sit idle, with the amount of wasted time depending on how dependent the reduce tasks were on the slave machine's resources.

I first thought that perhaps the jobtracker was "overloading" the tasktrackers that had already finished their map tasks (and avoiding those that were still mapping). However, as I understand it, the reduce tasks are all launched at the beginning of the job so that they are all ready and waiting for map output data when it first appears.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Chris Schneider

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 19/Aug/06 06:08

Updated:: 08/Jul/09 16:51

Resolved:: 22/Aug/06 17:51