[HDFS-795] DFS Write pipeline does not detect defective datanode correctly in some cases (HADOOP-3339) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Duplicate
Affects Version/s: 0.20.1
Fix Version/s: 0.20.2
Component/s: hdfs-client
Labels:
None

Description

HDFS write pipeline does not select the correct datanode in some error cases. One example : say DN2 is the second datanode and write to it times out since it is in a bad state.. pipeline actually removes the first datanode. If such a datanode happens to be the last one in the pipeline, write is aborted completely with a hard error.

Essentially the error occurs when writing to a downstream datanode fails rather than reading. This bug was actually fixed in 0.18 (~~HADOOP-3339~~). But ~~HADOOP-1700~~ essentially reverted it. I am not sure why.

It is absolutely essential for HDFS to handle failures on subset of datanodes in a pipeline. We should not have at least known bugs that lead to hard failures.

I will attach patch for a hack that illustrates this problem. Still thinking of how an automated test would look like for this one.

My preferred target for this fix is 0.20.1.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

toreproduce-5796.patch
08/May/09 19:22
2 kB
Raghu Angadi

Issue Links

duplicates

HDFS-101 DFS write pipeline : DFSClient sometimes does not detect second datanode failure

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Raghu Angadi

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 08/May/09 19:11

Updated:: 18/Dec/09 01:58

Resolved:: 18/Dec/09 01:58