Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0
-
None
-
3 nodes cluster (1 CMG node)
-
Docs Required, Release Notes Required
Description
This issue is replaced by https://issues.apache.org/jira/browse/IGNITE-22517 from time to time.
Steps to reproduce:
- Create 3 nodes cluster with 1 CMG node (node_0 - CMG,node_1,node_2).
- Create zone with replication equals to amount of nodes (3).
- Create 10 tables inside the zone.
- Insert 100 rows in every table.
- Await all tables*partitions*nodes local state is "HEALTHY"
- Await all tables*partitions*nodes global state is "AVAILABLE"
- Kill non CMG node with kill -9. (kill node_1)
- Create new node and attach it to cluster instead of the killed one (update lookup configs of alive nodes to allow them find). (node_1_replacement)
- Using REST API await physical topology has 3 alive nodes.
- Using REST API await logical topology has 3 alive nodes.
- Await all tables*partitions*nodes local state is "HEALTHY" (by connecting to REST of node_2).
Expected:
All partitions become "HEALTHY".
Actual:
The exception from REST client:
org.gridgain.ai3tests.core.generated.restapi.invoker.ApiException: Message: Internal Server ErrorHTTP response code: 500HTTP response body: {"title":"Internal Server Error","status":500,"code":"IGN-RECOVERY-3","type":null,"detail":"org.apache.ignite.internal.network.UnresolvableConsistentIdException: IGN-NETWORK-1 TraceId:1e07b3f1-9714-4cb6-ba12-eed3296fb821 Recipient consistent ID cannot be resolved: ClusterFailover3NodesTest_cluster_1","node":null,"traceId":"1e07b3f1-9714-4cb6-ba12-eed3296fb821","invalidParams":null}HTTP response headers: {connection=[keep-alive], content-length=[384], content-type=[application/json+problem], date=[Thu, 20 Jun 2024 15:43:07 GMT]} at app//org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.handleResponse(ApiClient.java:1131) at app//org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.execute(ApiClient.java:1044) at app//org.gridgain.ai3tests.core.generated.restapi.api.RecoveryApi.getLocalPartitionStatesWithHttpInfo(RecoveryApi.java:335) at app//org.gridgain.ai3tests.core.generated.restapi.api.RecoveryApi.getLocalPartitionStates(RecoveryApi.java:312) at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.getLocalPartitionsStatesDistinct(ClusterFailoverTestBase.java:501) at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.lambda$awaitLocalState$15(ClusterFailoverTestBase.java:489) at app//org.gridgain.ai3tests.core.utils.RetryUtils.retryOnAllowedException(RetryUtils.java:61) at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.awaitLocalState(ClusterFailoverTestBase.java:485) at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.awaitAllPartitionsHealthyAndAvailable(ClusterFailoverTestBase.java:461) at app//org.gridgain.ai3tests.tests.failover.ClusterFailover3NodesTest.killNodeAndReplaceWithNewEmptyOne(ClusterFailover3NodesTest.java:165) at java.base@17.0.6/java.lang.reflect.Method.invoke(Method.java:568) at java.base@17.0.6/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base@17.0.6/java.lang.Thread.run(Thread.java:833)
Server logs are in attachment.
Attachments
Attachments
Issue Links
- is related to
-
IGNITE-22517 Partitions stay DEGRADED after node replacement in 3 node cluster
- Open