HDFS-5583, the interrupted flag was not consumed before join(), so join() always threw InterruptedException right away and it never actually worked. I noticed unexpected early termination of threads and found the uncleared flag to be the cause.
There are two flaws.
1) In the failing test case, the responder thread is blocked on a synchronized method and the test is calling another synchronized method before the responder, blocking the responder. Since synchronized methods cannot be interrupted, the responder would not terminate. Before fixing the uncleared flag issue, the receiver would blow up right away and the synchronized method being called by the test case would return (join on the receiver returns). The blocked responder is not in the critical path of this since join() on the responder was not actually done. The responder eventually unblocks and terminates on its own later.
The correct test would either increase the test timeout to be longer than the join timeout ("dfs.datanode.xceiver.stop.timeout.millis") or set the join timeout to be shorter.
2) stopWriter() has the same join() timeout as the one used for the receiver joining on the responder. It means that even if join() times out on the responder, stopWriter() will likely fail on timeout. A shorter timeout should be used when joining on the responder.