The following sequence of events results in a replica mistakenly marked corrupt:
1. Pipeline is open with 2 replicas
2. DN1 generates a block report but is slow in sending to the NN (eg some flaky network). It gets "stuck" right before the block report RPC.
3. Client closes the file.
4. DN2 is fast and sends blockReceived to the NN. NN marks the block as COMPLETE
5. DN1's block report proceeds, and includes the block in an RBW state.
6. NN incorrectly marks the replica as corrupt, since it is an RBW replica on a COMPLETE block.
- is related to
HDFS-2691 HA: Tests and fixes for pipeline targets and replica recovery