[HDFS-101] DFS write pipeline : DFSClient sometimes does not detect second datanode failure - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.20.1, 0.20-append
Fix Version/s: 0.20.2, 0.20-append, 0.21.0
Component/s: datanode
Labels:
None

Hadoop Flags:

Reviewed

Description

When the first datanode's write to second datanode fails or times out DFSClient ends up marking first datanode as the bad one and removes it from the pipeline. Similar problem exists on DataNode as well and it is fixed in ~~HADOOP-3339~~. From ~~HADOOP-3339~~ :

"The main issue is that BlockReceiver thread (and DataStreamer in the case of DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty coarse control. We don't know what state the responder is in and interrupting has different effects depending on responder state. To fix this properly we need to redesign how we handle these interactions."

When the first datanode closes its socket from DFSClient, DFSClient should properly read all the data left in the socket.. Also, DataNode's closing of the socket should not result in a TCP reset, otherwise I think DFSClient will not be able to read from the socket.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

detectDownDN-0.20.patch
17/Dec/09 22:39
8 kB
Hairong Kuang
detectDownDN1-0.20.patch
18/Dec/09 01:20
9 kB
Hairong Kuang
detectDownDN2.patch
18/Dec/09 01:28
7 kB
Hairong Kuang
detectDownDN3.patch
19/Dec/09 00:00
8 kB
Hairong Kuang
detectDownDN3-0.20.patch
19/Dec/09 00:00
9 kB
Hairong Kuang
detectDownDN3-0.20-yahoo.patch
04/Mar/10 00:49
10 kB
Hairong Kuang
HDFS-101_20-append.patch
08/Jun/10 18:34
12 kB
Nicolas Spiegelberg
hdfs-101.tar.gz
18/Dec/09 00:06
3 kB
Todd Lipcon
hdfs-101-branch-0.20-append-cdh3.txt
16/Jun/10 18:44
11 kB
Todd Lipcon
pipelineHeartbeat_yahoo.patch
21/Mar/10 01:48
4 kB
Hairong Kuang
pipelineHeartbeat.patch
20/Mar/10 22:21
4 kB
Hairong Kuang

Issue Links

blocks

HADOOP-4278 TestDatanodeDeath failed occasionally

Closed

incorporates

HDFS-700 BlockReceiver is ignoring java.io.InterruptedIOException.

Resolved

is blocked by

HDFS-793 DataNode should first receive the whole packet ack message before it constructs and sends its own ack message for the packet

Closed

is depended upon by

HDFS-142 In 0.20, move blocks being written into a blocksBeingWritten directory

Closed

is duplicated by

HDFS-795 DFS Write pipeline does not detect defective datanode correctly in some cases (HADOOP-3339)

Resolved

is related to

HDFS-1595 DFSClient may incorrectly detect datanode failure

Resolved

relates to

HDFS-564 Adding pipeline test 17-35

Closed

HADOOP-3339 DFS Write pipeline does not detect defective datanode correctly if it times out.

Closed

HDFS-1346 DFSClient receives out of order packet ack

Closed

(1 is related to, 3 relates to)

Activity

People

Assignee:: Hairong Kuang

Reporter:: Raghu Angadi

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 19/May/08 18:09

Updated:: 02/May/13 02:29

Resolved:: 22/Dec/09 00:00