The monitor to monitor socket communication does not have reconnect logic to handle a network reset or transient network errors.
• During a ~20 second network reset window, no errors are detected by open sockets
o Open sockets are dead, but there is no indication from the TCP/IP stack that socket is in an error condition
• Once the network is restored, a CONNECTIONLOSS is reported by the Zookeeper Client Library.
o However, reconnect logic reestablishes connection with quorum.
• At EPOLL expiration time, EPOLL logic report “Not heard from peer=n” and treats peer as Node Down.
o The node down logic deletes corresponding znode, CZClient::WatchNodeDelete()
o All monitor processes continually check for expired znodes for each node in the cluster, including their own znode
An expired znode is handled as a down node