Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
3.0.0-beta1
-
None
-
The 2 or 3 nodes cluster running locally.
-
Docs Required, Release Notes Required
Description
Steps to reproduce:
- Create zone with replication equals to amount of nodes (2 or 3 corresponding)
- Create 10 tables inside the zone.
- Insert 100 rows in every table.
- Await all tables*partitions*nodes local state is "HEALTHY"
- Await all tables*partitions*nodes global state is "AVAILABLE"
- Kill first node with kill -9.
- Assert all tables*partitions*nodes local state is "HEALTHY"
- Await all tables*partitions*nodes global state is "READ_ONLY" for 2 nodes cluster or "DEGRADED" for 3 nodes cluster,
- Execute select query using JDBC connecting to the second node (which is alive).
Expected:
Data is returned.
Actual:
The select query at step 9 freezes forever.
The errors on the server side:
2024-04-30 00:04:02:965 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-8][AbstractClientService] Fail to connect ClusterFailover3NodesTest_cluster_0, exception: java.util.concurrent.TimeoutException. 2024-04-30 00:04:02:965 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-8][ReplicatorGroupImpl] Fail to check replicator connection to peer=ClusterFailover3NodesTest_cluster_0, replicatorType=Follower. 2024-04-30 00:04:02:980 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-Response-Processor-1][AbstractClientService] Fail to connect ClusterFailover3NodesTest_cluster_0, exception: java.util.concurrent.TimeoutException. 2024-04-30 00:04:02:980 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-Response-Processor-1][ReplicatorGroupImpl] Fail to check replicator connection to peer=ClusterFailover3NodesTest_cluster_0, replicatorType=Follower. 2024-04-30 00:04:02:981 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-Response-Processor-1][NodeImpl] Fail to add a replicator, peer=ClusterFailover3NodesTest_cluster_0. 2024-04-30 00:04:02:981 +0200 [WARNING][ClusterFailover3NodesTest_cluster_1-client-8][RaftGroupServiceImpl] Recoverable error during the request occurred (will be retried on the randomly selected node) [request=WriteActionRequestImpl [command=[0, 9, 41, -117, -128, -8, -15, -83, -4, -54, -57, 1], deserializedCommand=SafeTimeSyncCommandImpl [safeTimeLong=112356769098760202], groupId=26_part_10], peer=Peer [consistentId=ClusterFailover3NodesTest_cluster_0, idx=0], newPeer=Peer [consistentId=ClusterFailover3NodesTest_cluster_1, idx=0]]. java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: /192.168.100.5:3344 at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1074) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.ignite.internal.network.netty.NettyUtils.lambda$toCompletableFuture$0(NettyUtils.java:74) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:326) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:342) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: /192.168.100.5:3344 Caused by: java.net.ConnectException: Connection refused: no further information at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834) 2024-04-30 00:04:02:982 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-0][AbstractClientService] Fail to connect ClusterFailover3NodesTest_cluster_0, exception: java.util.concurrent.TimeoutException.
Attachments
Issue Links
- is superceded by
-
IGNITE-22187 Cluster of 2 or 3 nodes doesn't work if one node is down
- Resolved