[HDFS-10490] Client may never recovery replica after a timeout during sending packet - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.6.0
Fix Version/s: None
Component/s: datanode
Labels:
None

Description

For newly created replica, a meta file is created in constructor of BlockReceiver (for WRITE_BLOCK op). Its header will be written lazily (buffered in memory first by BufferedOutputStream).
If following packets fail to deliver (e.g. in extreme network condition), the header may never get flush until closed.
However, BlockReceiver will not call close until block receiving is finished or exception(s) encountered. Also in extreme network condition, both RST & FIN may not deliver in time.

In this case, if client tries to initiates a transferBlock to a new datanode (in addDatanode2ExistingPipeline), existing datanode will see an empty meta if its BlockReceiver did not close in time.
Then, after ~~HDFS-3429~~, a default DataChecksum (NULL, 512) will be used during transfer. So when client then tries to recover pipeline after completely transferred, it may encounter the following exception:

java.io.IOException: Client requested checksum DataChecksum(type=CRC32C, chunkSize=4096) when appending to an existing block with different chunk size: DataChecksum(type=NULL, chunkSize=512)
        at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.createStreams(ReplicaInPipeline.java:230)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:226)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:798)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:76)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243)
        at java.lang.Thread.run(Thread.java:745)

This will repeat, until exhausted by datanode replacement policy.

Also to note that, with bad luck (like I), 20k clients are all doing this. It's to some extend a DDoS attack to NameNode (because of getAdditionalDataNode calls).

I suggest we flush immediately after header is written, preventing anybody from seeing empty meta file for avoiding the issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-10490.patch
06/Jun/16 03:59
0.8 kB
He Tianyi
HDFS-10490.0001.patch
12/Jun/16 09:16
5 kB
He Tianyi

Issue Links

relates to

HDFS-10178 Permanent write failures can happen if pipeline recoveries occur for the first packet

Closed

Activity

People

Assignee:: Unassigned

Reporter:: He Tianyi

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 06/Jun/16 03:56

Updated:: 13/Jun/16 17:51