Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-23618

Fix dead lock when restoring metastorage

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0
    • None

    Description

      Dead lock was found during metastorage recovery, stack trace of the problem:

        [2024-11-05T11:33:20,693][WARN ][%iicrt_ccbp_1%common-scheduler-0][FailureManager] Possible failure suppressed according to a configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_BLOCKED]
            org.apache.ignite.lang.IgniteException: A critical thread is blocked for 804 ms that is more than the allowed 500 ms, it is "%iicrt_ccbp_1%MessagingService-inbound-0-0" prio=10 Id=4987 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@558a94f4 owned by "%iicrt_ccbp_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0" Id=5020
              at java.base@11.0.17/jdk.internal.misc.Unsafe.park(Native Method)
              -  waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@558a94f4
              at java.base@11.0.17/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
              at java.base@11.0.17/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
              at java.base@11.0.17/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
              at java.base@11.0.17/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
              at java.base@11.0.17/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:959)
              at app//org.apache.ignite.internal.metastorage.server.AbstractKeyValueStorage.setRecoveryRevisionsListener(AbstractKeyValueStorage.java:318)
              at app//org.apache.ignite.internal.metastorage.impl.RecoveryRevisionsListenerImpl.completeRecoveryFinishFutureIfPossible(RecoveryRevisionsListenerImpl.java:92)
              at app//org.apache.ignite.internal.metastorage.impl.RecoveryRevisionsListenerImpl.setTargetRevisions(RecoveryRevisionsListenerImpl.java:73)
              at app//org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.lambda$recover$1(MetaStorageManagerImpl.java:327)
              at app//org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$1899/0x0000000800bf4c40.accept(Unknown Source)
              at java.base@11.0.17/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:714)
              at java.base@11.0.17/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
              at java.base@11.0.17/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
              at app//org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$39(RaftGroupServiceImpl.java:592)
              at app//org.apache.ignite.internal.raft.RaftGroupServiceImpl$$Lambda$1798/0x0000000800bb6c40.accept(Unknown Source)
              at java.base@11.0.17/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
              at java.base@11.0.17/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
              at java.base@11.0.17/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
              at java.base@11.0.17/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
              at app//org.apache.ignite.internal.network.DefaultMessagingService.onInvokeResponse(DefaultMessagingService.java:587)
              at app//org.apache.ignite.internal.network.DefaultMessagingService.handleInvokeResponse(DefaultMessagingService.java:478)
              at app//org.apache.ignite.internal.network.DefaultMessagingService.lambda$handleMessageFromNetwork$4(DefaultMessagingService.java:412)
              at app//org.apache.ignite.internal.network.DefaultMessagingService$$Lambda$1866/0x0000000800bde040.run(Unknown Source)
              at java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
              at java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
              at java.base@11.0.17/java.lang.Thread.run(Thread.java:834)
            
              Number of locked synchronizers = 2
              - java.util.concurrent.locks.ReentrantLock$NonfairSync@c55b7e
              - java.util.concurrent.ThreadPoolExecutor$Worker@526479eb
      

      Attachments

        Issue Links

          Activity

            People

              ktkalenko@gridgain.com Kirill Tkalenko
              ktkalenko@gridgain.com Kirill Tkalenko
              Philipp Shergalis Philipp Shergalis
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m