I think we should still explain how that can lead to what we saw :
One particular case I looked at shows one datanode does not write 64k of data (or overwrites last 64k):
The last (third) data node in the pipeline failed with :
2008-03-17 20:38:01,928 INFO org.apache.hadoop.dfs.DataNode: Changing block file offset of block blk_7114623733442731588 from 85983232 to 86048768 meta file offset to 672263
2008-03-17 20:38:01,928 INFO org.apache.hadoop.dfs.DataNode: Exception in receiveBlock for block blk_7114623733442731588 java.io.IOException: Trying to change block file offset of block blk_7114623733442731588 to 86048768 but actual size of file is 85983232
The client retried with remaining DNs and succeded.
Say 'x' == 85983232.
Block file in tmp dir on bad datanode is x bytes long and meta data file is 672263 bytes long.
Data from failed datanode and a good datanode for this block shows that data till x-64k matches on both. 64k at at x-64k on bad datanode matches 64k at x at the good datanode. The meta data file data matches on both side. So this shows that the bad datanode either some how did not write the last but one packet or overwrote it with the last packet. Each packet has 64k of real data.