[IGNITE-22553] Internal Server Error after node replacement in 3 node cluster - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0
Fix Version/s: None
Component/s: general, persistence, rest
Labels:
- ignite-3
Environment:

3 nodes cluster (1 CMG node)

Ignite Flags:

Docs Required, Release Notes Required

Description

This issue is replaced by https://issues.apache.org/jira/browse/IGNITE-22517 from time to time.

Steps to reproduce:

Create 3 nodes cluster with 1 CMG node (node_0 - CMG,node_1,node_2).
Create zone with replication equals to amount of nodes (3).
Create 10 tables inside the zone.
Insert 100 rows in every table.
Await all tables*partitions*nodes local state is "HEALTHY"
Await all tables*partitions*nodes global state is "AVAILABLE"
Kill non CMG node with kill -9. (kill node_1)
Create new node and attach it to cluster instead of the killed one (update lookup configs of alive nodes to allow them find). (node_1_replacement)
Using REST API await physical topology has 3 alive nodes.
Using REST API await logical topology has 3 alive nodes.
Await all tables*partitions*nodes local state is "HEALTHY" (by connecting to REST of node_2).

Expected:
All partitions become "HEALTHY".

Actual:
The exception from REST client:

org.gridgain.ai3tests.core.generated.restapi.invoker.ApiException: Message: Internal Server ErrorHTTP response code: 500HTTP response body: {"title":"Internal Server Error","status":500,"code":"IGN-RECOVERY-3","type":null,"detail":"org.apache.ignite.internal.network.UnresolvableConsistentIdException: IGN-NETWORK-1 TraceId:1e07b3f1-9714-4cb6-ba12-eed3296fb821 Recipient consistent ID cannot be resolved: ClusterFailover3NodesTest_cluster_1","node":null,"traceId":"1e07b3f1-9714-4cb6-ba12-eed3296fb821","invalidParams":null}HTTP response headers: {connection=[keep-alive], content-length=[384], content-type=[application/json+problem], date=[Thu, 20 Jun 2024 15:43:07 GMT]}  at app//org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.handleResponse(ApiClient.java:1131)  at app//org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.execute(ApiClient.java:1044)  at app//org.gridgain.ai3tests.core.generated.restapi.api.RecoveryApi.getLocalPartitionStatesWithHttpInfo(RecoveryApi.java:335)  at app//org.gridgain.ai3tests.core.generated.restapi.api.RecoveryApi.getLocalPartitionStates(RecoveryApi.java:312)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.getLocalPartitionsStatesDistinct(ClusterFailoverTestBase.java:501)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.lambda$awaitLocalState$15(ClusterFailoverTestBase.java:489)  at app//org.gridgain.ai3tests.core.utils.RetryUtils.retryOnAllowedException(RetryUtils.java:61)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.awaitLocalState(ClusterFailoverTestBase.java:485)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.awaitAllPartitionsHealthyAndAvailable(ClusterFailoverTestBase.java:461)  at app//org.gridgain.ai3tests.tests.failover.ClusterFailover3NodesTest.killNodeAndReplaceWithNewEmptyOne(ClusterFailover3NodesTest.java:165)  at java.base@17.0.6/java.lang.reflect.Method.invoke(Method.java:568)  at java.base@17.0.6/java.util.concurrent.FutureTask.run(FutureTask.java:264)  at java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)  at java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)  at java.base@17.0.6/java.lang.Thread.run(Thread.java:833)

Server logs are in attachment.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Server_logs.zip
20/Jun/24 18:05
566 kB
Igor

Issue Links

is related to

IGNITE-22517 Partitions stay DEGRADED after node replacement in 3 node cluster

Open

Internal Server Error after node replacement in 3 node cluster

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates