[HADOOP-400] the job tracker re-runs failed tasks on the same node - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.4.0
Fix Version/s: 0.6.0
Component/s: None
Labels:
None

Description

The job tracker tries not to run tasks that have previously failed on a node on that node again, but it doesn't strictly prevent it.

I propose to change the rule so that when pollForNewTask is called by a TaskTracker, the JobTracker will only assign it a task that has failed on that TaskTracker, if and only if it has already failed on the entire cluster. Thus, for "normal" clusters with more than 4 TaskTrackers, you will be guaranteed that it will run on 4 different TaskTrackers. For small clusters, it will run on every TaskTracker in the cluster at least once.

Does that sound reasonable to everyone?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--task-schedule.patch
03/Aug/06 19:08
19 kB
Owen O'Malley

Activity

People

Assignee:: Owen O'Malley

Reporter:: Owen O'Malley

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Jul/06 20:33

Updated:: 08/Jul/09 16:51

Resolved:: 09/Aug/06 13:46