[HADOOP-3333] job failing because of reassigning same tasktracker to failing tasks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.16.3
Fix Version/s: 0.18.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

We have a long running a job in a 2nd atttempt. Previous job was failing and current jobs risks to fail as well, because reduce tasks failing on marginal TaskTrackers are assigned repeatedly to the same TaskTrackers (probably because it is the only available slot), eventually running out of attempts.
Reduce tasks should be assigned to the same TaskTrackers at most twice, or TaskTrackers need to get some better smarts to find failing hardware.
BTW, mapred.reduce.max.attempts=12, which is high, but does not help in this case.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hadoop-3333-v3.patch
18/Jun/08 05:36
17 kB
Jothi Padmanabhan
hadoop-3333-v2.patch
17/Jun/08 11:23
16 kB
Jothi Padmanabhan
hadoop-3333-v1.patch
17/Jun/08 06:24
16 kB
Jothi Padmanabhan
hadoop-3333.patch
16/Jun/08 16:28
16 kB
Jothi Padmanabhan
HADOOP-3333_2_20080506.patch
06/May/08 21:53
12 kB
Arun Murthy
HADOOP-3333_1_20080505.patch
06/May/08 01:25
11 kB
Arun Murthy
HADOOP-3333_0_20080503.patch
04/May/08 02:07
9 kB
Arun Murthy

Issue Links

is related to

HADOOP-3403 Job tracker's ExpireTackers thread gets NullPointerException if a tasktracker is lost.

Closed

Activity

People

Assignee:: Jothi Padmanabhan

Reporter:: Christian Kunz

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 01/May/08 17:26

Updated:: 08/Jul/09 16:52

Resolved:: 19/Jun/08 13:05