DataTree.serializeNode synchronizes on the DataNode it is about to serialize then writes it out via OutputArchive.writeRecord, potentially to a network connection. Under default linux TCP settings, a network connection where the other side completely disappears will hang (blocking on the java.net.SocketOutputStream.socketWrite0 call) for over 15 minutes. During this time, any attempt to create/delete/modify the DataNode will cause the leader to hang at the beginning of the request processor chain:
Additionally, any attempt to send a snapshot to a follower or to disk will hang.
Because the ping packets are sent by another thread which is unaffected, followers never time out and become leader, even though the cluster will make no progress until either the leader is killed or the TCP connection times out. This isn't exactly a deadlock since it will resolve itself eventually, but as mentioned above this will take > 15 minutes with the default TCP retry settings in linux.
A simple solution to this is: in DataTree.serializeNode we can take a copy of the contents of the DataNode (as is done with its children) in the synchronized block, then call writeRecord with the copy of the DataNode outside of the original DataNode synchronized block.