Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.16.3
-
None
-
None
-
Reviewed
Description
We have a long running a job in a 2nd atttempt. Previous job was failing and current jobs risks to fail as well, because reduce tasks failing on marginal TaskTrackers are assigned repeatedly to the same TaskTrackers (probably because it is the only available slot), eventually running out of attempts.
Reduce tasks should be assigned to the same TaskTrackers at most twice, or TaskTrackers need to get some better smarts to find failing hardware.
BTW, mapred.reduce.max.attempts=12, which is high, but does not help in this case.
Attachments
Attachments
Issue Links
- is related to
-
HADOOP-3403 Job tracker's ExpireTackers thread gets NullPointerException if a tasktracker is lost.
-
- Closed
-