The proposal in above comment is a simple solution, but it has an unresolvable flaw. The client needs READ permission to read the partial chunk which a writer does not require to have.
So instead of using a different solution, I focus on making the existing solution to work. Here is the plan:
1. The client does not set appendChunk to be false until the first chunk has been sent;
2. If datanode receives a request to append "bcd" to a partial chunk "a" but "bc" have already been written to disk in a previous hflush, the datanode will read "abc" from disk and computes the checksum of "abcd" and then write "c" and the new checksum to the disk. In the current trunk and 0.21, the datanode mistakenly computes the crc of "abcbcd".
I will also implement the proposed optimization: a block receiver at the datanode is not to overwrite block file if a packet starts with data that the replica file already has. For the crc file, only the last 4 bytes are allowed to be overwritten. This optimization makes it easy to implement datanode side fix.