Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-12594

Deadlock between GridCacheDataStore#purgeExpiredInternal and GridNearTxLocal#enlistWriteEntry

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8
    • None
    • None

    Description

      The deadlock is reproduced occasionally in PDS3 suite and can be seen in the thread dump below.
      One thread attempts to unwind evicts, acquires checkpoint read lock and then locks GridCacheMapEntry. Another thread does GridCacheMapEntry#unswap, determines that the entry is expired and acquires checkpoint read lock to remove the entry from the store.
      We should not acquire checkpoint read lock inside of a locked GridCacheMapEntry.

      Thread [name="updater-1", id=29900, state=WAITING, blockCnt=2, waitCnt=4450]
          Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc51685, ownerName=null, ownerId=-1]
              at sun.misc.Unsafe.park(Native Method)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
              at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
              at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1632)   <- CP read lock
              at o.a.i.i.processors.cache.GridCacheMapEntry.onExpired(GridCacheMapEntry.java:4081)
              at o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:559)
              at o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:519)                                                      <- locked entry
              at o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWriteEntry(GridNearTxLocal.java:1437)
              at o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWrite(GridNearTxLocal.java:1303)
              at o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync0(GridNearTxLocal.java:957)
              at o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync(GridNearTxLocal.java:491)
              at o.a.i.i.processors.cache.GridCacheAdapter$29.inOp(GridCacheAdapter.java:2526)
              at o.a.i.i.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:4727)
              at o.a.i.i.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:3740)
              at o.a.i.i.processors.cache.GridCacheAdapter.putAll0(GridCacheAdapter.java:2524)
              at o.a.i.i.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2513)
              at o.a.i.i.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1264)
              at o.a.i.i.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:863)
              at o.a.i.i.processors.cache.persistence.IgnitePdsContinuousRestartTest$1.call(IgnitePdsContinuousRestartTest.java:291)
              at o.a.i.testframework.GridTestThread.run(GridTestThread.java:83)
      
          Locked synchronizers:
              java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7
      
      
      Thread [name="sys-stripe-0-#24086%persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy0%", id=29617, state=WAITING, blockCnt=2, waitCnt=65381]
          Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7, ownerName=updater-1, ownerId=29900]
              at sun.misc.Unsafe.park(Native Method)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
              at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
              at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)                                                                       <- lock entry
              at o.a.i.i.processors.cache.GridCacheMapEntry.lockEntry(GridCacheMapEntry.java:5017)
              at o.a.i.i.processors.cache.GridCacheMapEntry.markObsoleteVersion(GridCacheMapEntry.java:2799)
              at o.a.i.i.processors.cache.distributed.dht.topology.GridDhtLocalPartition.removeVersionedEntry(GridDhtLocalPartition.java:392)
              at o.a.i.i.processors.cache.distributed.dht.topology.GridDhtLocalPartition.cleanupRemoveQueue(GridDhtLocalPartition.java:416)
              at o.a.i.i.processors.cache.distributed.dht.topology.GridDhtLocalPartition.onDeferredDelete(GridDhtLocalPartition.java:441)
              at o.a.i.i.processors.cache.distributed.dht.GridDhtCacheAdapter.onDeferredDelete(GridDhtCacheAdapter.java:1696)
              at o.a.i.i.processors.cache.GridCacheContext.onDeferredDelete(GridCacheContext.java:1710)
              at o.a.i.i.processors.cache.GridCacheMapEntry.onTtlExpired(GridCacheMapEntry.java:4037)
              at o.a.i.i.processors.cache.GridCacheTtlManager$1.applyx(GridCacheTtlManager.java:75)
              at o.a.i.i.processors.cache.GridCacheTtlManager$1.applyx(GridCacheTtlManager.java:66)
              at o.a.i.i.util.lang.IgniteInClosure2X.apply(IgniteInClosure2X.java:37)
              at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2725)     <- CP read lock
              at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2651)
              at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1047)
              at o.a.i.i.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:242)
              at o.a.i.i.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:874)
              at o.a.i.i.processors.cache.transactions.IgniteTxStateImpl.unwindEvicts(IgniteTxStateImpl.java:106)
              at o.a.i.i.processors.cache.GridCacheIoManager.onMessageProcessed(GridCacheIoManager.java:1182)
              at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1161)
              at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
              at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
              at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
              at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
              at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
              at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1607)
              at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1231)
              at o.a.i.i.managers.communication.GridIoManager.access$4300(GridIoManager.java:132)
              at o.a.i.i.managers.communication.GridIoManager$8.run(GridIoManager.java:1124)
              at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
              at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:119)
              at java.lang.Thread.run(Thread.java:748)

      Reproduced by PDS 3 https://ggtc.gridgain.com/viewLog.html?buildId=2706284&buildTypeId=Tests_GridGainCeEeUe_Latest_CE_Pds3&tab=buildResultsDiv&branch_Tests_GridGainCeEeUe_Latest_CE=<default>

      Attachments

        Issue Links

          Activity

            People

              sergey-chugunov Sergey Chugunov
              akalashnikov Anton Kalashnikov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m