Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-2855

Writers blocked forever when waiting on update operations

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.3, 2.2.1
    • Fix Version/s: 2.2.4
    • Component/s: jackrabbit-core
    • Labels:
      None

      Description

      Thread 1 calls Session.save() and has a write lock.

      Thread 2 is in XA prepare() and is waiting on thread 1 in FineGrainedISMLocking.acquireWriteLock().

      Thread 1's save calls SharedItemStateManager.Update#end() and performs a write-lock downgrade to a read-lock, then (at the end of Update#end()) it calls readLock.release(). FineGrainedISMLocking.ReadLockImpl#release thinks activeWriterId is of the current transation and does not notify any writers (activeWriterId is not being reset on downgrade in what seems to be a related to JCR-2753).
      Thread 1 waits forever.

        Activity

        Hide
        Jukka Zitting added a comment -

        Fixed in revision 1066059 by keeping count of active readers, and clearing the activeWriterId when all reader and writer locks have been released. Merged to the 2.2 branch in revision 1066061.

        The activeWriterId needs to be left non-null for a downgraded write lock so that a concurrent reader in the same transaction can re-enter the lock even if there is another writer waiting. See JCR-2753 for more background.

        Show
        Jukka Zitting added a comment - Fixed in revision 1066059 by keeping count of active readers, and clearing the activeWriterId when all reader and writer locks have been released. Merged to the 2.2 branch in revision 1066061. The activeWriterId needs to be left non-null for a downgraded write lock so that a concurrent reader in the same transaction can re-enter the lock even if there is another writer waiting. See JCR-2753 for more background.
        Hide
        Yoav Landman added a comment -

        I am seeing a similar behavior with 2.2.1 (took me a while to get to upgrading for testing this again).
        A thread (thread 1 from my other comment) is blocked in org.apache.jackrabbit.core.state.FineGrainedISMLocking.acquireWriteLock (FineGrainedISMLocking.java:143). Debugging shows that the activeWriterId in this frame is actually of a (pooled) thread that is idle and no longer doing any jcr activity .
        I am not closely familiar with the design, so perhaps I'm completely off, but it is unclear to me why writers that are downgraded to readers do not clean up the activeWriterId (in addition to nullifying the activeWriter - FineGrainedISMLocking.java:191). This seems to cause waiting writers not to get notified upon release of the read lock. Adding activeWriterId=null to WriteLockImpl.downgrade() seems to fix the problem for me while passing all the jackrabbit-core tests.

        pool-2-thread-27@8112, prio=5, in group 'main', status: 'waiting'
        java.lang.Thread.State: WAITING
        at java.lang.Object.wait(Object.java:-1)
        at java.lang.Object.wait(Object.java:485)
        at EDU.oswego.cs.dl.util.concurrent.Latch.acquire(Unknown Source:-1)
        at org.apache.jackrabbit.core.state.FineGrainedISMLocking.acquireWriteLock(FineGrainedISMLocking.java:143) <-- activeWriterId is of a thread that is no longer active
        at org.apache.jackrabbit.core.state.SharedItemStateManager.acquireWriteLock(SharedItemStateManager.java:1850)
        at org.apache.jackrabbit.core.state.SharedItemStateManager.access$200(SharedItemStateManager.java:115)
        at org.apache.jackrabbit.core.state.SharedItemStateManager$Update.begin(SharedItemStateManager.java:565)
        at org.apache.jackrabbit.core.state.SharedItemStateManager.beginUpdate(SharedItemStateManager.java:1459)
        at org.apache.jackrabbit.core.state.XAItemStateManager.prepare(XAItemStateManager.java:163)
        at org.apache.jackrabbit.core.TransactionContext.prepare(TransactionContext.java:157)

        • locked <0x2041> (a org.apache.jackrabbit.core.TransactionContext)
          at org.apache.jackrabbit.core.XASessionImpl.prepare(XASessionImpl.java:312)
          at org.springframework.extensions.jcr.jackrabbit.support.JackRabbitUserTransaction.commit(JackRabbitUserTransaction.java:91)
          at org.springframework.extensions.jcr.jackrabbit.LocalTransactionManager.doCommit(LocalTransactionManager.java:189)
          at org.artifactory.jcr.JcrTransactionManager.doCommit(JcrTransactionManager.java:75)
          at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:754)
          at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:723)
          at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:393)
          at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:120)
          ...
        Show
        Yoav Landman added a comment - I am seeing a similar behavior with 2.2.1 (took me a while to get to upgrading for testing this again). A thread (thread 1 from my other comment) is blocked in org.apache.jackrabbit.core.state.FineGrainedISMLocking.acquireWriteLock (FineGrainedISMLocking.java:143). Debugging shows that the activeWriterId in this frame is actually of a (pooled) thread that is idle and no longer doing any jcr activity . I am not closely familiar with the design, so perhaps I'm completely off, but it is unclear to me why writers that are downgraded to readers do not clean up the activeWriterId (in addition to nullifying the activeWriter - FineGrainedISMLocking.java:191). This seems to cause waiting writers not to get notified upon release of the read lock. Adding activeWriterId=null to WriteLockImpl.downgrade() seems to fix the problem for me while passing all the jackrabbit-core tests. pool-2-thread-27@8112, prio=5, in group 'main', status: 'waiting' java.lang.Thread.State: WAITING at java.lang.Object.wait(Object.java:-1) at java.lang.Object.wait(Object.java:485) at EDU.oswego.cs.dl.util.concurrent.Latch.acquire(Unknown Source:-1) at org.apache.jackrabbit.core.state.FineGrainedISMLocking.acquireWriteLock(FineGrainedISMLocking.java:143) <-- activeWriterId is of a thread that is no longer active at org.apache.jackrabbit.core.state.SharedItemStateManager.acquireWriteLock(SharedItemStateManager.java:1850) at org.apache.jackrabbit.core.state.SharedItemStateManager.access$200(SharedItemStateManager.java:115) at org.apache.jackrabbit.core.state.SharedItemStateManager$Update.begin(SharedItemStateManager.java:565) at org.apache.jackrabbit.core.state.SharedItemStateManager.beginUpdate(SharedItemStateManager.java:1459) at org.apache.jackrabbit.core.state.XAItemStateManager.prepare(XAItemStateManager.java:163) at org.apache.jackrabbit.core.TransactionContext.prepare(TransactionContext.java:157) locked <0x2041> (a org.apache.jackrabbit.core.TransactionContext) at org.apache.jackrabbit.core.XASessionImpl.prepare(XASessionImpl.java:312) at org.springframework.extensions.jcr.jackrabbit.support.JackRabbitUserTransaction.commit(JackRabbitUserTransaction.java:91) at org.springframework.extensions.jcr.jackrabbit.LocalTransactionManager.doCommit(LocalTransactionManager.java:189) at org.artifactory.jcr.JcrTransactionManager.doCommit(JcrTransactionManager.java:75) at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:754) at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:723) at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:393) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:120) ...
        Hide
        Jukka Zitting added a comment -

        You tested this with 2.1.3, right? This seems related to JCR-2820, but is probably a slightly different problem since JCR-2820 is already fixed in 2.1.3.

        Can you still reproduce the problem with 2.2.x? If you can, please post a thread dump to make it easier to analyze the problem.

        Show
        Jukka Zitting added a comment - You tested this with 2.1.3, right? This seems related to JCR-2820 , but is probably a slightly different problem since JCR-2820 is already fixed in 2.1.3. Can you still reproduce the problem with 2.2.x? If you can, please post a thread dump to make it easier to analyze the problem.

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Yoav Landman
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development