Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
3.0
-
None
-
The 2 nodes cluster (1 CMG node).
-
Docs Required, Release Notes Required
Description
Steps to reproduce:
- Start cluster of 2 nodes with one CMG node.
- Create zone with replication equals to amount of nodes (2).
- Create 10 tables inside the zone.
- Insert 100 rows in every table.
- Await all tables*partitions*nodes local state is "HEALTHY"
- Await all tables*partitions*nodes global state is "AVAILABLE"
- Kill non CMG node with kill -9.
- Assert physical topology contains only 1 alive node.
- Assert logical topology contains only 1 alive node.
- Await all tables*partitions*nodes local state is "HEALTHY"
- Await all tables*partitions*nodes global state is "READ_ONLY".
- Execute select query using JDBC connecting to the alive CMG node.
Expected:
Data is returned.
Actual:
The exception on step 12 occurs:
Failed to get the primary replica [tablePartitionId=10_part_1] java.sql.SQLException: Failed to get the primary replica [tablePartitionId=10_part_1] at org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57) at org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154) at org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:111) at org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:91) at org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.getActualResult(ClusterFailoverTestBase.java:338) at org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.assertDataIsFilledWithoutErrors(ClusterFailoverTestBase.java:169) at org.gridgain.ai3tests.tests.failover.ClusterFailover2NodesTest.singleKillAndCheckOtherNodeWorks(ClusterFailover2NodesTest.java:123) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
In the server logs continuous errors:
2024-06-14 18:10:58:719 +0000 [WARNING][%ClusterFailover2NodesTest_cluster_0%Raft-Group-Client-7][RaftGroupServiceImpl] Recoverable error during the request occurred (will be retried on the randomly selected node) [request=ReadIndexRequestImpl [entriesList=null, groupId=28_part_1, peerId=ClusterFailover2NodesTest_cluster_1, serverId=ClusterFailover2NodesTest_cluster_1], peer=Peer [consistentId=ClusterFailover2NodesTest_cluster_1, idx=0], newPeer=Peer [consistentId=ClusterFailover2NodesTest_cluster_1, idx=0]]. java.util.concurrent.CompletionException: java.net.ConnectException: Peer ClusterFailover2NodesTest_cluster_1 is unavailable at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1099) at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:558) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleErrorResponse$44(RaftGroupServiceImpl.java:653) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.net.ConnectException: Peer ClusterFailover2NodesTest_cluster_1 is unavailable at org.apache.ignite.internal.raft.RaftGroupServiceImpl.resolvePeer(RaftGroupServiceImpl.java:806) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:557) ... 7 more
Server logs are in the attachments.
Attachments
Attachments
Issue Links
- is superceded by
-
IGNITE-23087 Wrong partitions status if 1 node of 2 nodes cluster is down
- Open
- supercedes
-
IGNITE-22187 Cluster of 2 or 3 nodes doesn't work if one node is down
- Resolved