Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-22553

Internal Server Error after node replacement in 3 node cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0-beta2
    • None
    • general, persistence, rest
    • 3 nodes cluster (1 CMG node)

    • Docs Required, Release Notes Required

    Description

      This issue is replaced by https://issues.apache.org/jira/browse/IGNITE-22517 from time to time.

      Steps to reproduce:

      1. Create 3 nodes cluster with 1 CMG node (node_0 - CMG,node_1,node_2).
      2. Create zone with replication equals to amount of nodes (3).
      3. Create 10 tables inside the zone.
      4. Insert 100 rows in every table.
      5. Await all tables*partitions*nodes local state is "HEALTHY"
      6. Await all tables*partitions*nodes global state is "AVAILABLE"
      7. Kill non CMG node with kill -9. (kill node_1)
      8. Create new node and attach it to cluster instead of the killed one (update lookup configs of alive nodes to allow them find). (node_1_replacement)
      9. Using REST API await physical topology has 3 alive nodes.
      10. Using REST API await logical topology has 3 alive nodes.
      11. Await all tables*partitions*nodes local state is "HEALTHY" (by connecting to REST of node_2).

      Expected:
      All partitions become "HEALTHY".

      Actual:
      The exception from REST client:

      org.gridgain.ai3tests.core.generated.restapi.invoker.ApiException: Message: Internal Server ErrorHTTP response code: 500HTTP response body: {"title":"Internal Server Error","status":500,"code":"IGN-RECOVERY-3","type":null,"detail":"org.apache.ignite.internal.network.UnresolvableConsistentIdException: IGN-NETWORK-1 TraceId:1e07b3f1-9714-4cb6-ba12-eed3296fb821 Recipient consistent ID cannot be resolved: ClusterFailover3NodesTest_cluster_1","node":null,"traceId":"1e07b3f1-9714-4cb6-ba12-eed3296fb821","invalidParams":null}HTTP response headers: {connection=[keep-alive], content-length=[384], content-type=[application/json+problem], date=[Thu, 20 Jun 2024 15:43:07 GMT]}  at app//org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.handleResponse(ApiClient.java:1131)  at app//org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.execute(ApiClient.java:1044)  at app//org.gridgain.ai3tests.core.generated.restapi.api.RecoveryApi.getLocalPartitionStatesWithHttpInfo(RecoveryApi.java:335)  at app//org.gridgain.ai3tests.core.generated.restapi.api.RecoveryApi.getLocalPartitionStates(RecoveryApi.java:312)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.getLocalPartitionsStatesDistinct(ClusterFailoverTestBase.java:501)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.lambda$awaitLocalState$15(ClusterFailoverTestBase.java:489)  at app//org.gridgain.ai3tests.core.utils.RetryUtils.retryOnAllowedException(RetryUtils.java:61)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.awaitLocalState(ClusterFailoverTestBase.java:485)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.awaitAllPartitionsHealthyAndAvailable(ClusterFailoverTestBase.java:461)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailover3NodesTest.killNodeAndReplaceWithNewEmptyOne(ClusterFailover3NodesTest.java:165)  at java.base@17.0.6/java.lang.reflect.Method.invoke(Method.java:568)  at java.base@17.0.6/java.util.concurrent.FutureTask.run(FutureTask.java:264)  at java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)  at java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)  at java.base@17.0.6/java.lang.Thread.run(Thread.java:833) 

      Server logs are in attachment.

      Attachments

        1. Server_logs.zip
          566 kB
          Igor

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lunigorn Igor
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: