FAILURE DETECTION DELAY in this test: ServerImpl.CON_CHECK_INTERVAL + 2 * failureDetectionTimeout + 100ms (in isConnectionRefused()) + 200ms (in markLastFailedNodeAlive()) + code_delays. HOW WORST DELAY IN THIS CASE HAPPENS: Let's consider we have 3 nodes with orders 1, 2 and 3. Failing node would be 2 as an example. 'Node fails' means it doesn't answer on socket operation but no errors appear like with long GC pauses. 1) Node1 successfully pings node 2. Let’s suppose Node2 fails right after this as a worst case. 2) Node1 waited maximum for CON_CHECK_INTERVAL and pings Node2 again. Timeout on all operations on Node1 (send ping message, read response): failureDetectionTimeout. 3) Node1 hasn't received ping response from Node2 within failureDetectionTimeout. Here we lost CON_CHECK_INTERVAL + failureDetectionTimeout. Node1 opens connection to Node3 asking for permanent collaboration instead of Node2. Node1 opens socket to Node3. Node1 sends IgniteUtils.IGNITE_HEADER to Node3 Node1 sends TcpDiscoveryHandshakeRequest to Node3. Node1 waits for TcpDiscoveryHandshakeResponse from Node3. Timeout on all the operation on Node1 (connect, send TcpDiscoveryHandshakeRequest and receive TcpDiscoveryHandshakeResponse): failureDetectionTimeout. 4) Node3 accepts connection from Node1 but doesn't think its previous Node2 has failed. Node3 reads IgniteUtils.IGNITE_HEADER from Node1. Node3 reads first message from Node1. Node3 checks connection to Node2 in ServerImpl.isConnectionRefused() because last received message from Node2 was in the interval 'now - 2 * CON_CHECK_INTERVAL': // We got message from previous in less than double CON_CHECK_INTERVAL. boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now; ok == true ServerImpl.isConnectionRefused() has hardcoded timeout 100ms: try (Socket sock = new Socket()) { sock.connect(addr, 100); } We suppose this timeout happens and Node3 takes additional 100ms. Node1 is still waiting within failureDetectionTimeout. Timeouts on operations on Node3: - spi.networkTimeout 5000ms to read IgniteUtils.IGNITE_HEADER from Node1 - another spi.networkTimeout 5000ms to read first message (TcpDiscoveryHandshakeRequest) from Node1 Timeouts on operations on Node1: All within same failureDetectionTimeout of step #3. 5) Node3 gets IOException (a timeout) on sock.connect(addr, 100). This is considered as a alive connection for some reason in ServerImpl.isConnectionRefused(): catch (IOException e) { return false;//False means connection is not refused. } Node3 denies permanent connection with Node 1 and sends TcpDiscoveryHandshakeResponse to it. Timeouts on operations on Node3: - 100ms in isConnectionRefused() - spi.getEffectiveSocketTimeout(srvSock) == failureDetectionTimeout to write HandshakeResponse to Node1 Timeouts on operations on Node1: same failureDetectionTimeout as at step #3 We lost at this moment: CON_CHECK_INTERVAL + failureDetectionTimeout + 100ms in ServerImpl.isConnectionRefused(). 6) Node1 receives TcpDiscoveryHandshakeResponse denying topology change. Node1 stopped waiting failureDetectionTimeout of step #3. Node1 waits for 200ms for some reason in ServerImpl.CrossRingMessageSendState.markLastFailedNodeAlive(): try { Thread.sleep(200); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } Timeouts on operations: 200ms in markLastFailedNodeAlive() We lost at this moment: CON_CHECK_INTERVAL + failureDetectionTimeout + 100ms_in_isConnectionRefused() + 200ms in markLastFailedNodeAlive(). 7) Node1 tries to connect to failed Node2 again like at step #3. Timeouts on all operations: failure detection timeout 8) Node2 do not send HandshakerResponse. Or Node1 even cannot open socket to Node2. Node1 was unable to connect to failed Node2 within failureDetectionTimeout second time. We lost at this moment: CON_CHECK_INTERVAL + failureDetectionTimeout + 100 in isConnectionRefused() + 200ms in markLastFailedNodeAlive() + failureDetectionTimeout. 9) Node1 asks Node3 for permanent connection again. Node1 sends IgniteUtils.IGNITE_HEADER to Node3. Node1 sends TcpDiscoveryHandshakeRequest to Node3. Node3 reads IgniteUtils.IGNITE_HEADER from Node1. Node3 reads first message from Node1. Node3 accepts connection from Node1 but won't check if Node2 is available because too much time was spent since last message from Node2: boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now; ok == false Node3 answers TcpDiscoveryHandshakeResponse with previousNodeAlive() == false: Node1 and Node3 deploys permanent connection. Timeouts on operations on Node3: - spi.networkTimeout 5000ms to read IgniteUtils.IGNITE_HEADER from Node1 - another spi.networkTimeout 5000ms to read first message (TcpDiscoveryHandshakeRequest) from Node1 - spi.getEffectiveSocketTimeout(srvSock) == failureDetectionTimeout to write HandshakeResponse to Node1 - 100ms in isConnectionRefused() Timeouts on operations on Node1: All within failureDetectionTimeout at sten #7. 10) Node1 detects Node2 failed and sends TcpDiscoveryNodeFailedMessage across the ring. Sending TcpDiscoveryNodeFailedMessage is considered as final point of node failure detection