Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.13.0
-
None
-
None
Description
Timed-out tasks (and also tasks which fail with FSError) are marked as KILLED rather than as FAILED. The major issue with this is that post HADOOP-1050 only FAILED task-attempts are considered to decide if the TIP has failed, and hence there exists a corner case where a TIP which has 4 timed-out tasks isn't marked as FAILED and thus the job keeps running too...
Considering this is a corner-case and is going to entail not-too-insignificant changes to TaskTracker's control-flow (ugly as it is right now), I'm proposing to fix this either for 0.13.1 (if need be) or better: 0.14.
Thoughts?