Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-23642

Unable to start a node due to too many assignments recovered

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      There is a cluster with 3 nodes and 1200 partitions in total (400 per node). When the cluster is restarted, each node recovers the Metastorage successfully, its leader is elected, then partitions recovery is started. This results in a lot of exceptions like the following in logs:

       

      2024-11-08 13:23:28:845 +0000 [INFO][%node1%tableManager-io-15][NodeImpl] Node <48_part_3/node1> start vote and grant vote self, term=1.
      2024-11-08 13:23:28:846 +0000 [ERROR][%node1%Raft-Group-Client-14][RebalanceUtil] Exception on updating assignments for [tableId=38, name=INVENTORY, partition=23]
      java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException: Send with retry timed out [retryCount = 7, groupId = metastorage_group, traceId = 5f329100-3de7-4ab8-a796-9969b7b91b22].
              at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
              at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source)
              at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source)
              at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
              at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
              at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559)
              at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750)
              at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
              at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
              at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at java.base/java.lang.Thread.run(Unknown Source)

       

      Also, there is another stack trace:

       

      2024-11-08 13:27:03:523 +0000 [WARNING][%node1%rebalance-scheduler-11][RebalanceRaftGroupEventsListener] Unable to start rebalance [tablePartitionId, term=44_part_45] 
      java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Send with retry timed out [retryCount = 7, groupId = metastorage_group, traceId = d52b447e-3c40-4f4b-9c67-863be811b0cb]. 
      at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown Source) 
      at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source) at org.apache.ignite.internal.distributionzones.rebalance.RebalanceRaftGroupEventsListener.lambda$onLeaderElected$0(RebalanceRaftGroupEventsListener.java:167) 
      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
      at java.base/java.util.concurrent.FutureTask.run(Unknown Source) 
      at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) 
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
      at java.base/java.lang.Thread.run(Unknown Source) 
      Caused by: java.util.concurrent.TimeoutException: Send with retry timed out [retryCount = 7, groupId = metastorage_group, traceId = d52b447e-3c40-4f4b-9c67-863be811b0cb]. at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559) 
      at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750) ... 6 more

       

      It seems that an avalanche of Metastorage accesses by hundreds of starting partitions overloads the Metastorage leader, so recovery fails with TimeoutExceptions.

      We could probably solve this by establishing some kind of rate limiting on Metastorage accesses. We could implement this just for the recovery procedure or for normal operation as well.

      High-priority accesses (Metastorage SafeTime propagation, Lease updates) should not be subject to rate limiting.

      Attachments

        Issue Links

          Activity

            People

              alapin Alexander Lapin
              rpuch Roman Puchkovskiy
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: