I've attached 2 preliminary patches.
h564-24.patch is a patch for the pre-append-merge trunk. This patch changes the behavior of BlockReciever.java in 2 ways. 1) When downstream error happens (ends up in handleMirrorOutError()), the receiver thread no longer interrupts the responder thread, which may lead the responder to behave as if local error has occurred and give the wrong idea to upstream node. 2) The responder will try to read all downstream statuses (up to first ERROR status) before sending its own status and forwarding others to upstream node. If the responder fails to read all downstream statuses it needs, it will mark the next downstream datanode as ERROR.
h564-24.patch implements all the tests except 26-28 and 31-33. In the case of test 26-28, I've seen intermittent failures similar to those described in
HDFS-101, i.e., when the first datanode sends all statuses to DFSClient and closes the socket, DFSClient isn't able to read those statuses and instead gets a TCP reset. As a result, DFSClient will mistakenly consider the first datanode at fault. In the case of test 31-33, the DFSClient will keep receiving seqno == -1 (keep alive) and hang.
h564-25.patch is a quick port of h564-24.patch to the current post-append-merge trunk. Unfortunately many of the tests are failing and I may not have time to investigate it. Hopefully, someone can pick it up from here.