Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.20.205.0, 0.23.0
-
None
-
None
Description
Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources.
Attachments
Issue Links
- relates to
-
MAPREDUCE-3121 DFIP aka 'NodeManager should handle Disk-Failures In Place'
- Closed