Sorry, Yanbo. I though I replied your comment. This is a problem identified in branch-1 in a few deployed environments. I will try your tests with trunk and get back to you soon.
Part of the problem here is that getAdditionalBlock()(and thus addBlock()) is not real idempotent. When the client or namenode or the network between them causes error, it can leave an assigned blockID but not block created on datanode.
If addBlock() is really idempotent, the namenode can identified and delete the dangling blockID when it gets the repeated addBlock() request. To make this api idempotent is to add the offset as input parameter, so namenode can check the offset to validate if it's a repeated request. I will upload a patch for that.