[HADOOP-1252] Disk problems should be handled better by the MR framework - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.12.3
Fix Version/s: 0.13.0
Component/s: None
Labels:
None

Description

The MR framework should recover from Disk Failure problems without causing jobs to hang. Note that this issue is about a short-term solution to solving the problem. For example, by looking at the code and improving the exception handling (to better detect faulty disks and missing files). The long term approach might be to have a FS layer that takes care of failed disks and makes it transparent to the tasks. That will be a separate issue by itself.
Some of the issues that have been reported are ~~HADOOP-1087~~ and a comment by Koji on ~~HADOOP-1200~~ (not sure whether those are all). Please add to this issue as much details as possible on disk failures leading to hung jobs (details like relevant exception traces, way to reproduce, etc.).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1252.patch
25/Apr/07 13:21
31 kB
Devaraj Das
1252.patch
27/Apr/07 17:30
30 kB
Devaraj Das
1252.new.patch
03/May/07 15:34
33 kB
Devaraj Das
1252.may7.patch
07/May/07 13:21
41 kB
Devaraj Das

Activity

People

Assignee:: Devaraj Das

Reporter:: Devaraj Das

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 12/Apr/07 13:07

Updated:: 08/Jul/09 16:52

Resolved:: 07/May/07 21:25