Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.20.1, 0.20-append
-
None
-
Reviewed
Description
When the first datanode's write to second datanode fails or times out DFSClient ends up marking first datanode as the bad one and removes it from the pipeline. Similar problem exists on DataNode as well and it is fixed in HADOOP-3339. From HADOOP-3339 :
"The main issue is that BlockReceiver thread (and DataStreamer in the case of DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty coarse control. We don't know what state the responder is in and interrupting has different effects depending on responder state. To fix this properly we need to redesign how we handle these interactions."
When the first datanode closes its socket from DFSClient, DFSClient should properly read all the data left in the socket.. Also, DataNode's closing of the socket should not result in a TCP reset, otherwise I think DFSClient will not be able to read from the socket.
Attachments
Attachments
Issue Links
- blocks
-
HADOOP-4278 TestDatanodeDeath failed occasionally
- Closed
- incorporates
-
HDFS-700 BlockReceiver is ignoring java.io.InterruptedIOException.
- Resolved
- is blocked by
-
HDFS-793 DataNode should first receive the whole packet ack message before it constructs and sends its own ack message for the packet
- Closed
- is depended upon by
-
HDFS-142 In 0.20, move blocks being written into a blocksBeingWritten directory
- Closed
- is duplicated by
-
HDFS-795 DFS Write pipeline does not detect defective datanode correctly in some cases (HADOOP-3339)
- Resolved
- is related to
-
HDFS-1595 DFSClient may incorrectly detect datanode failure
- Resolved
- relates to
-
HDFS-564 Adding pipeline test 17-35
- Closed
-
HADOOP-3339 DFS Write pipeline does not detect defective datanode correctly if it times out.
- Closed
-
HDFS-1346 DFSClient receives out of order packet ack
- Closed