[IGNITE-23642] Unable to start a node due to too many assignments recovered - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- ignite-3

Description

There is a cluster with 3 nodes and 1200 partitions in total (400 per node). When the cluster is restarted, each node recovers the Metastorage successfully, its leader is elected, then partitions recovery is started. This results in a lot of exceptions like the following in logs:

2024-11-08 13:23:28:845 +0000 [INFO][%node1%tableManager-io-15][NodeImpl] Node <48_part_3/node1> start vote and grant vote self, term=1.
2024-11-08 13:23:28:846 +0000 [ERROR][%node1%Raft-Group-Client-14][RebalanceUtil] Exception on updating assignments for [tableId=38, name=INVENTORY, partition=23]
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException: Send with retry timed out [retryCount = 7, groupId = metastorage_group, traceId = 5f329100-3de7-4ab8-a796-9969b7b91b22].
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
        at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559)
        at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

Also, there is another stack trace:

2024-11-08 13:27:03:523 +0000 [WARNING][%node1%rebalance-scheduler-11][RebalanceRaftGroupEventsListener] Unable to start rebalance [tablePartitionId, term=44_part_45] 
java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Send with retry timed out [retryCount = 7, groupId = metastorage_group, traceId = d52b447e-3c40-4f4b-9c67-863be811b0cb]. 
at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown Source) 
at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source) at org.apache.ignite.internal.distributionzones.rebalance.RebalanceRaftGroupEventsListener.lambda$onLeaderElected$0(RebalanceRaftGroupEventsListener.java:167) 
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
at java.base/java.util.concurrent.FutureTask.run(Unknown Source) 
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) 
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
at java.base/java.lang.Thread.run(Unknown Source) 
Caused by: java.util.concurrent.TimeoutException: Send with retry timed out [retryCount = 7, groupId = metastorage_group, traceId = d52b447e-3c40-4f4b-9c67-863be811b0cb]. at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559) 
at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750) ... 6 more

It seems that an avalanche of Metastorage accesses by hundreds of starting partitions overloads the Metastorage leader, so recovery fails with TimeoutExceptions.

We could probably solve this by establishing some kind of rate limiting on Metastorage accesses. We could implement this just for the recovery procedure or for normal operation as well.

High-priority accesses (Metastorage SafeTime propagation, Lease updates) should not be subject to rate limiting.

Attachments

Issue Links

relates to

IGNITE-23597 Idle Raft Replicator eats too much CPU

Open

Unable to start a node due to too many assignments recovered

Details

Description

Attachments

Issue Links

Activity

People

Dates