Status: Patch Available
Affects Version/s: 2.6.0
Fix Version/s: None
For newly created replica, a meta file is created in constructor of BlockReceiver (for WRITE_BLOCK op). Its header will be written lazily (buffered in memory first by BufferedOutputStream).
If following packets fail to deliver (e.g. in extreme network condition), the header may never get flush until closed.
However, BlockReceiver will not call close until block receiving is finished or exception(s) encountered. Also in extreme network condition, both RST & FIN may not deliver in time.
In this case, if client tries to initiates a transferBlock to a new datanode (in addDatanode2ExistingPipeline), existing datanode will see an empty meta if its BlockReceiver did not close in time.
HDFS-3429, a default DataChecksum (NULL, 512) will be used during transfer. So when client then tries to recover pipeline after completely transferred, it may encounter the following exception:
This will repeat, until exhausted by datanode replacement policy.
Also to note that, with bad luck (like I), 20k clients are all doing this. It's to some extend a DDoS attack to NameNode (because of getAdditionalDataNode calls).
I suggest we flush immediately after header is written, preventing anybody from seeing empty meta file for avoiding the issue.