Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-22187

Cluster of 2 or 3 nodes doesn't work if one node is down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 3.0, 3.0.0-beta1
    • None
    • The 2 or 3 nodes cluster running locally.

    • Docs Required, Release Notes Required

    Description

      Steps to reproduce:

      1. Create zone with replication equals to amount of nodes (2 or 3 corresponding)
      2. Create 10 tables inside the zone.
      3. Insert 100 rows in every table.
      4. Await all tables*partitions*nodes local state is "HEALTHY"
      5. Await all tables*partitions*nodes global state is "AVAILABLE"
      6. Kill first node with kill -9.
      7. Assert all tables*partitions*nodes local state is "HEALTHY"
      8. Await all tables*partitions*nodes global state is "READ_ONLY" for 2 nodes cluster or "DEGRADED" for 3 nodes cluster,
      9. Execute select query using JDBC connecting to the second node (which is alive).

      Expected:

      Data is returned.

      Actual:
      On the step 7 it returns error by REST API:

      {"title":"Internal Server Error","status":500,"code":"IGN-RECOVERY-3","type":null,"detail":"io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /172.120.6.2:3344","node":null,"traceId":"2acb52fc-3275-411b-a4de-45f14873f15c","invalidParams":null}

      In the server logs continuous errors:

      2024-05-08 10:37:19:796 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-9][AbstractClientService] Fail to connect ClusterFailover3NodesTest_cluster_0, exception: java.net.ConnectException.
      2024-05-08 10:37:19:796 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-9][ReplicatorGroupImpl] Fail to check replicator connection to peer=ClusterFailover3NodesTest_cluster_0, replicatorType=Follower.
      2024-05-08 10:37:19:796 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-12][AbstractClientService] Fail to connect ClusterFailover3NodesTest_cluster_0, exception: java.net.ConnectException.
      2024-05-08 10:37:19:796 +0200 [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-12][ReplicatorGroupImpl] Fail to check replicator connection to peer=ClusterFailover3NodesTest_cluster_0, replicatorType=Follower. 

      If skip steps 7 and 8, then the exception on step 9 occurs:

      java.sql.SQLException: Unable to send fragment [targetNode=ClusterFailover3NodesTest_cluster_0, fragmentId=1, cause=io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: /192.168.100.5:3344]
          at org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
          at org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)
          at org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:111)
          at org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:91)
          at org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.getActualResult(ClusterFailoverTestBase.java:336)
          at org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.assertDataIsFilledWithoutErrors(ClusterFailoverTestBase.java:154)
          at org.gridgain.ai3tests.tests.failover.ClusterFailover3NodesTest.singleKillAndCheckOtherNodeWorks(ClusterFailover3NodesTest.java:96)
          at java.base/java.lang.reflect.Method.invoke(Method.java:566)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:834) 

      Attachments

        Issue Links

          Activity

            People

              apolovtcev Aleksandr Polovtsev
              lunigorn Igor
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: