Cayenne
  1. Cayenne
  2. CAY-1782

Deadlock when performing many concurrent inserts

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0, 3.1 (final), 3.2M1
    • Fix Version/s: 3.2.M2
    • Component/s: Core Library
    • Labels:
      None

      Description

      I've encountered a deadlock issue in production in an app performing many INSERTs. The deadlock was between the PK generator and the PoolManager (getting a DB connection). It is very bad. I added a unit test demonstrating the problem and a fix for it.

      The fix is possibly not ideal because it requires a larger data structure for holding the cached primary keys, but it is far better than the previous behavior.

      If this fix is acceptable this should be back-ported to 3.1 as well.

        Activity

        Hide
        Andrus Adamchik added a comment - - edited

        I can't reproduce a deadlock per se, only connection pool exhaustion (so if you have stack traces to prove an actual deadlock, appreciate you posting them), but I am all for removing synchronization from the PK generator. Big +1 on the effort.

        > The fix is possibly not ideal because it requires a larger data structure for holding the cached primary keys, but it is far better than the previous behavior.

        This is indeed a performance-sensitive code. Can we use AtomicLong (or a wrapper around AtomicLong) to store a range of ids? As an added benefit it won't require synchronization when retrieving a key.

        (I edited this comment to remove my initial concerns about our attempts to avoid gaps... this may be really not about avoiding gaps, but minimizing DB trips for pk generation).

        Show
        Andrus Adamchik added a comment - - edited I can't reproduce a deadlock per se, only connection pool exhaustion (so if you have stack traces to prove an actual deadlock, appreciate you posting them), but I am all for removing synchronization from the PK generator. Big +1 on the effort. > The fix is possibly not ideal because it requires a larger data structure for holding the cached primary keys, but it is far better than the previous behavior. This is indeed a performance-sensitive code. Can we use AtomicLong (or a wrapper around AtomicLong) to store a range of ids? As an added benefit it won't require synchronization when retrieving a key. (I edited this comment to remove my initial concerns about our attempts to avoid gaps... this may be really not about avoiding gaps, but minimizing DB trips for pk generation).
        Hide
        Andrus Adamchik added a comment - - edited

        Aside from the discussion of a better implementation...

        1. Should we synchronize on 'pks' here instead of 'pkCache' ?

        synchronized (pkCache)

        { value = pks.first(); pks.remove(value); }

        or maybe replace a set with something like ConcurrentLinkedQueue, as the DB guarantees us uniqueness of "our" range, and the order of keys is not relevant?

        2. while (!pkCache.replace(entity.getName(), pks, nextPks))

        I suggest we add some last resort exit condition in case our thread never gets a chance to set its PK. Maybe a for loop from 0 to 999, and then throw an exception?

        Show
        Andrus Adamchik added a comment - - edited Aside from the discussion of a better implementation... 1. Should we synchronize on 'pks' here instead of 'pkCache' ? synchronized (pkCache) { value = pks.first(); pks.remove(value); } or maybe replace a set with something like ConcurrentLinkedQueue, as the DB guarantees us uniqueness of "our" range, and the order of keys is not relevant? 2. while (!pkCache.replace(entity.getName(), pks, nextPks)) I suggest we add some last resort exit condition in case our thread never gets a chance to set its PK. Maybe a for loop from 0 to 999, and then throw an exception?
        Hide
        John Huss added a comment -

        You're right, it is connection exhaustion, not deadlock per se since it does not stay locked forever. But it is deadlock in the sense that two threads have obtained locks and are waiting for each other - one of them just doesn't wait forever. As I've thought about it more, there may be a different/better fix if I understand it correctly. But I'm not very familiar with this code and I'm only addressing it out of necessity. My understanding is this:

        • First thread tries to save and checks out a DB connection from the pool
        • First thread realizes it needs to generate a primary key and locks the pkCache
        • Second thread tries to save and checks out a DB connection from the pool
        • First thread tries to check out ANOTHER connection from the pool to run the sql to generate a PK (the problem is that it should use the connection it already has).
        • Second thread realizes it needs to generate a primary key and waits to lock the pkCache - stuck
        • First thread can't obtain a connection because the second thread has one and IT already has one!

        I'm not positive on that, but it's my best shot at the moment. If that is the case, then the problem could be avoided by allowing the pk generator code to use an exiting database connection passed in as a parameter rather than having it obtain one itself.

        Show
        John Huss added a comment - You're right, it is connection exhaustion, not deadlock per se since it does not stay locked forever. But it is deadlock in the sense that two threads have obtained locks and are waiting for each other - one of them just doesn't wait forever. As I've thought about it more, there may be a different/better fix if I understand it correctly. But I'm not very familiar with this code and I'm only addressing it out of necessity. My understanding is this: First thread tries to save and checks out a DB connection from the pool First thread realizes it needs to generate a primary key and locks the pkCache Second thread tries to save and checks out a DB connection from the pool First thread tries to check out ANOTHER connection from the pool to run the sql to generate a PK (the problem is that it should use the connection it already has). Second thread realizes it needs to generate a primary key and waits to lock the pkCache - stuck First thread can't obtain a connection because the second thread has one and IT already has one! I'm not positive on that, but it's my best shot at the moment. If that is the case, then the problem could be avoided by allowing the pk generator code to use an exiting database connection passed in as a parameter rather than having it obtain one itself.
        Hide
        Andrus Adamchik added a comment -

        PK generation happens prior to a commit thread grabbing the connection for commit operation. However the important part is that the entire commit (including pk generation) is done in a context of a thread-bound Transaction. And DataNode.TransactionDataSource inner class ensures that each thread only uses one connection per DataNode at the most. Connection is reserved lazily and stays "reserved" to this thread till the end of the transaction.

        Show
        Andrus Adamchik added a comment - PK generation happens prior to a commit thread grabbing the connection for commit operation. However the important part is that the entire commit (including pk generation) is done in a context of a thread-bound Transaction. And DataNode.TransactionDataSource inner class ensures that each thread only uses one connection per DataNode at the most. Connection is reserved lazily and stays "reserved" to this thread till the end of the transaction.
        Hide
        John Huss added a comment -

        Here is an accurate account of the original issue after having stepped through it more.

        Thread 1 does commitChanges (2 objects will be inserted)
        Thread 1 checks out a DB connection to generate the PK for the first object
        Thread 1 gets the PK and exits the pkCache lock
        Thread 2 does commitChanges (2 objects will be inserted)
        Thread 2 enters the pkCache lock (which is open now)
        Thread 2 waits (20 seconds) for a DB connection to generate a PK (Thread 1 still has the connection).
        Thread 1 tries to generate a PK for it's second object. It already has a DB connection, but it can not enter the pkCache lock while Thread 2 holds it
        Deadlock (for 20 seconds)

        Here are the stack traces:

        Thread t@43523: (state = BLOCKED)

              • This thread is waiting for the pkCache lock
        • org.apache.cayenne.dba.JdbcPkGenerator.generatePk(org.apache.cayenne.access.DataNode, org.apache.cayenne.map.DbAttribute) @bci=110, line=257 (Interpreted frame)
        • org.apache.cayenne.access.DataDomainInsertBucket.createPermIds(org.apache.cayenne.access.DbEntityClassDescriptor, java.util.Collection) @bci=313, line=171 (Interpreted frame)
              • This thread holds a DB connection that it obtained while generating a PK for the first object it was inserted, now it is trying to handle the second object
        • org.apache.cayenne.access.DataDomainInsertBucket.appendQueriesInternal(java.util.Collection) @bci=161, line=76 (Interpreted frame)
        • org.apache.cayenne.access.DataDomainSyncBucket.appendQueries(java.util.Collection) @bci=18, line=78 (Interpreted frame)
        • org.apache.cayenne.access.DataDomainFlushAction.preprocess(org.apache.cayenne.access.DataContext, org.apache.cayenne.graph.GraphDiff) @bci=173, line=188 (Interpreted frame)
        • org.apache.cayenne.access.DataDomainFlushAction.flush(org.apache.cayenne.access.DataContext, org.apache.cayenne.graph.GraphDiff) @bci=138, line=144 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain.onSyncFlush(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff) @bci=59, line=710 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain$2.transform(java.lang.Object) @bci=12, line=674 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain.runInTransaction(org.apache.commons.collections.Transformer) @bci=25, line=734 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain.onSyncNoFilters(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=73, line=671 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain$DataDomainSyncFilterChain.onSync(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=32, line=873 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain.onSync(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=15, line=642 (Interpreted frame)
        • org.apache.cayenne.access.DataContext.flushToParent(boolean) @bci=93, line=758 (Interpreted frame)
        • org.apache.cayenne.access.DataContext.commitChanges() @bci=2, line=697 (Interpreted frame)
        • org.apache.cayenne.dba.ConcurrentPkGeneratorTest$1.run() @bci=58, line=68 (Interpreted frame)
        • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=441 (Interpreted frame)
        • java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303 (Interpreted frame)
        • java.util.concurrent.FutureTask.run() @bci=4, line=138 (Interpreted frame)
        • java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
        • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
        • java.lang.Thread.run() @bci=11, line=680 (Interpreted frame)

        Thread t@43779: (state = BLOCKED)

              • This thread is waiting for a DB connection
        • java.lang.Object.wait(long) @bci=0 (Interpreted frame)
        • org.apache.cayenne.conn.PoolManager.uncheckPooledConnection(java.lang.String, java.lang.String) @bci=38, line=433 (Interpreted frame)
        • org.apache.cayenne.conn.PoolManager.getConnection(java.lang.String, java.lang.String) @bci=21, line=372 (Interpreted frame)
        • org.apache.cayenne.conn.PoolManager.getConnection() @bci=9, line=361 (Interpreted frame)
        • org.apache.cayenne.access.DataNode$TransactionDataSource.getConnection() @bci=83, line=348 (Interpreted frame)
        • org.apache.cayenne.access.DataNode.performQueries(java.util.Collection, org.apache.cayenne.access.OperationObserver) @bci=71, line=260 (Interpreted frame)
        • org.apache.cayenne.dba.JdbcPkGenerator.longPkFromDatabase(org.apache.cayenne.access.DataNode, org.apache.cayenne.map.DbEntity) @bci=96, line=308 (Interpreted frame)
        • org.apache.cayenne.dba.JdbcPkGenerator.generatePk(org.apache.cayenne.access.DataNode, org.apache.cayenne.map.DbAttribute) @bci=171, line=266 (Interpreted frame)
              • This thread holds the pkCache lock
        • org.apache.cayenne.access.DataDomainInsertBucket.createPermIds(org.apache.cayenne.access.DbEntityClassDescriptor, java.util.Collection) @bci=313, line=171 (Interpreted frame)
        • org.apache.cayenne.access.DataDomainInsertBucket.appendQueriesInternal(java.util.Collection) @bci=161, line=76 (Interpreted frame)
        • org.apache.cayenne.access.DataDomainSyncBucket.appendQueries(java.util.Collection) @bci=18, line=78 (Interpreted frame)
        • org.apache.cayenne.access.DataDomainFlushAction.preprocess(org.apache.cayenne.access.DataContext, org.apache.cayenne.graph.GraphDiff) @bci=173, line=188 (Interpreted frame)
        • org.apache.cayenne.access.DataDomainFlushAction.flush(org.apache.cayenne.access.DataContext, org.apache.cayenne.graph.GraphDiff) @bci=138, line=144 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain.onSyncFlush(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff) @bci=59, line=710 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain$2.transform(java.lang.Object) @bci=12, line=674 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain.runInTransaction(org.apache.commons.collections.Transformer) @bci=25, line=734 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain.onSyncNoFilters(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=73, line=671 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain$DataDomainSyncFilterChain.onSync(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=32, line=873 (Interpreted frame)
        • org.apache.cayenne.access.DataDomain.onSync(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=15, line=642 (Interpreted frame)
        • org.apache.cayenne.access.DataContext.flushToParent(boolean) @bci=93, line=758 (Interpreted frame)
        • org.apache.cayenne.access.DataContext.commitChanges() @bci=2, line=697 (Interpreted frame)
        • org.apache.cayenne.dba.ConcurrentPkGeneratorTest$1.run() @bci=58, line=68 (Interpreted frame)
        • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=441 (Interpreted frame)
        • java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303 (Interpreted frame)
        • java.util.concurrent.FutureTask.run() @bci=4, line=138 (Interpreted frame)
        • java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
        • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
        • java.lang.Thread.run() @bci=11, line=680 (Interpreted frame)

        So the "locks" really have to be obtained together - if you can obtain the pkCache lock, but you don't have a DB connection yet, then you can cause a deadlock because you might be waiting to reuse a connection from a thread that is waiting to lock the pkCache. You can avoid the problem by moving the database call (longPkFromDatabase) OUT of the pkCache synchronized block so that the two activities don't block each other.

        The downside of doing that is that you may generate more primary keys faster than you need because multiple threads may both try to refill the same exhausted range, and then unless you change the prior implementation one of the generated ranges would just be thrown away. So you want to allow for concurrent pk generation but avoid wasting entire ranges – by using a larger data structure, i.e. a COLLECTION of ranges instead of a single range. Implementation-wise maintaining a mutable collection of mutable ranges would be hairy, so I elected to decompose the range into it's elements which allows you to just use a simple collection (at the expense of using more memory).

        I've implemented the changes you suggested above - using a Queue instead (nice) and safe-guarding the loop that replace the map entry.

        Show
        John Huss added a comment - Here is an accurate account of the original issue after having stepped through it more. Thread 1 does commitChanges (2 objects will be inserted) Thread 1 checks out a DB connection to generate the PK for the first object Thread 1 gets the PK and exits the pkCache lock Thread 2 does commitChanges (2 objects will be inserted) Thread 2 enters the pkCache lock (which is open now) Thread 2 waits (20 seconds) for a DB connection to generate a PK (Thread 1 still has the connection). Thread 1 tries to generate a PK for it's second object. It already has a DB connection, but it can not enter the pkCache lock while Thread 2 holds it Deadlock (for 20 seconds) Here are the stack traces: Thread t@43523: (state = BLOCKED) This thread is waiting for the pkCache lock org.apache.cayenne.dba.JdbcPkGenerator.generatePk(org.apache.cayenne.access.DataNode, org.apache.cayenne.map.DbAttribute) @bci=110, line=257 (Interpreted frame) org.apache.cayenne.access.DataDomainInsertBucket.createPermIds(org.apache.cayenne.access.DbEntityClassDescriptor, java.util.Collection) @bci=313, line=171 (Interpreted frame) This thread holds a DB connection that it obtained while generating a PK for the first object it was inserted, now it is trying to handle the second object org.apache.cayenne.access.DataDomainInsertBucket.appendQueriesInternal(java.util.Collection) @bci=161, line=76 (Interpreted frame) org.apache.cayenne.access.DataDomainSyncBucket.appendQueries(java.util.Collection) @bci=18, line=78 (Interpreted frame) org.apache.cayenne.access.DataDomainFlushAction.preprocess(org.apache.cayenne.access.DataContext, org.apache.cayenne.graph.GraphDiff) @bci=173, line=188 (Interpreted frame) org.apache.cayenne.access.DataDomainFlushAction.flush(org.apache.cayenne.access.DataContext, org.apache.cayenne.graph.GraphDiff) @bci=138, line=144 (Interpreted frame) org.apache.cayenne.access.DataDomain.onSyncFlush(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff) @bci=59, line=710 (Interpreted frame) org.apache.cayenne.access.DataDomain$2.transform(java.lang.Object) @bci=12, line=674 (Interpreted frame) org.apache.cayenne.access.DataDomain.runInTransaction(org.apache.commons.collections.Transformer) @bci=25, line=734 (Interpreted frame) org.apache.cayenne.access.DataDomain.onSyncNoFilters(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=73, line=671 (Interpreted frame) org.apache.cayenne.access.DataDomain$DataDomainSyncFilterChain.onSync(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=32, line=873 (Interpreted frame) org.apache.cayenne.access.DataDomain.onSync(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=15, line=642 (Interpreted frame) org.apache.cayenne.access.DataContext.flushToParent(boolean) @bci=93, line=758 (Interpreted frame) org.apache.cayenne.access.DataContext.commitChanges() @bci=2, line=697 (Interpreted frame) org.apache.cayenne.dba.ConcurrentPkGeneratorTest$1.run() @bci=58, line=68 (Interpreted frame) java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=441 (Interpreted frame) java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303 (Interpreted frame) java.util.concurrent.FutureTask.run() @bci=4, line=138 (Interpreted frame) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame) java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame) java.lang.Thread.run() @bci=11, line=680 (Interpreted frame) Thread t@43779: (state = BLOCKED) This thread is waiting for a DB connection java.lang.Object.wait(long) @bci=0 (Interpreted frame) org.apache.cayenne.conn.PoolManager.uncheckPooledConnection(java.lang.String, java.lang.String) @bci=38, line=433 (Interpreted frame) org.apache.cayenne.conn.PoolManager.getConnection(java.lang.String, java.lang.String) @bci=21, line=372 (Interpreted frame) org.apache.cayenne.conn.PoolManager.getConnection() @bci=9, line=361 (Interpreted frame) org.apache.cayenne.access.DataNode$TransactionDataSource.getConnection() @bci=83, line=348 (Interpreted frame) org.apache.cayenne.access.DataNode.performQueries(java.util.Collection, org.apache.cayenne.access.OperationObserver) @bci=71, line=260 (Interpreted frame) org.apache.cayenne.dba.JdbcPkGenerator.longPkFromDatabase(org.apache.cayenne.access.DataNode, org.apache.cayenne.map.DbEntity) @bci=96, line=308 (Interpreted frame) org.apache.cayenne.dba.JdbcPkGenerator.generatePk(org.apache.cayenne.access.DataNode, org.apache.cayenne.map.DbAttribute) @bci=171, line=266 (Interpreted frame) This thread holds the pkCache lock org.apache.cayenne.access.DataDomainInsertBucket.createPermIds(org.apache.cayenne.access.DbEntityClassDescriptor, java.util.Collection) @bci=313, line=171 (Interpreted frame) org.apache.cayenne.access.DataDomainInsertBucket.appendQueriesInternal(java.util.Collection) @bci=161, line=76 (Interpreted frame) org.apache.cayenne.access.DataDomainSyncBucket.appendQueries(java.util.Collection) @bci=18, line=78 (Interpreted frame) org.apache.cayenne.access.DataDomainFlushAction.preprocess(org.apache.cayenne.access.DataContext, org.apache.cayenne.graph.GraphDiff) @bci=173, line=188 (Interpreted frame) org.apache.cayenne.access.DataDomainFlushAction.flush(org.apache.cayenne.access.DataContext, org.apache.cayenne.graph.GraphDiff) @bci=138, line=144 (Interpreted frame) org.apache.cayenne.access.DataDomain.onSyncFlush(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff) @bci=59, line=710 (Interpreted frame) org.apache.cayenne.access.DataDomain$2.transform(java.lang.Object) @bci=12, line=674 (Interpreted frame) org.apache.cayenne.access.DataDomain.runInTransaction(org.apache.commons.collections.Transformer) @bci=25, line=734 (Interpreted frame) org.apache.cayenne.access.DataDomain.onSyncNoFilters(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=73, line=671 (Interpreted frame) org.apache.cayenne.access.DataDomain$DataDomainSyncFilterChain.onSync(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=32, line=873 (Interpreted frame) org.apache.cayenne.access.DataDomain.onSync(org.apache.cayenne.ObjectContext, org.apache.cayenne.graph.GraphDiff, int) @bci=15, line=642 (Interpreted frame) org.apache.cayenne.access.DataContext.flushToParent(boolean) @bci=93, line=758 (Interpreted frame) org.apache.cayenne.access.DataContext.commitChanges() @bci=2, line=697 (Interpreted frame) org.apache.cayenne.dba.ConcurrentPkGeneratorTest$1.run() @bci=58, line=68 (Interpreted frame) java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=441 (Interpreted frame) java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303 (Interpreted frame) java.util.concurrent.FutureTask.run() @bci=4, line=138 (Interpreted frame) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame) java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame) java.lang.Thread.run() @bci=11, line=680 (Interpreted frame) So the "locks" really have to be obtained together - if you can obtain the pkCache lock, but you don't have a DB connection yet, then you can cause a deadlock because you might be waiting to reuse a connection from a thread that is waiting to lock the pkCache. You can avoid the problem by moving the database call (longPkFromDatabase) OUT of the pkCache synchronized block so that the two activities don't block each other. The downside of doing that is that you may generate more primary keys faster than you need because multiple threads may both try to refill the same exhausted range, and then unless you change the prior implementation one of the generated ranges would just be thrown away. So you want to allow for concurrent pk generation but avoid wasting entire ranges – by using a larger data structure, i.e. a COLLECTION of ranges instead of a single range. Implementation-wise maintaining a mutable collection of mutable ranges would be hairy, so I elected to decompose the range into it's elements which allows you to just use a simple collection (at the expense of using more memory). I've implemented the changes you suggested above - using a Queue instead (nice) and safe-guarding the loop that replace the map entry.
        Hide
        Andrus Adamchik added a comment -

        Now I fully understand the deadlock. Indeed things look pretty bad if you have a starved connection pool.

        A few ideas re: the new implementation:

        long val = longPkFromDatabase(node, entity);
        Queue<Long> nextPks = mkRange(val, val + cacheSize - 1);
        int iterations = 0;
        while (!pkCache.replace(entity.getName(), pks, nextPks) && iterations < 999)

        { pks = pkCache.get(entity.getName()); // the cache for this entity has changed, so re-fetch it then update Queue<Long> previousPlusNext = new ConcurrentLinkedQueue<Long>(pks); previousPlusNext.addAll(nextPks); nextPks = previousPlusNext; iterations++; }

        if (iterations >= 999)

        { throw new IllegalStateException("Unable to add new primary keys to the cache for entity " + entity.getName()); }

        pks = nextPks;
        }

        value = pks.poll();

        1. Since we have a concurrent queue, do we have to clone and replace it every time? I think when we get a new range of keys, we can simply append it to the end of the existing queue (based on the earlier assertion that the order of keys is irrelevant).

        2. The thread that hit 'longPkFromDatabase', can actually immediately grab the returned value for itself, and then append to the queue the remaining part of the range. This way the thread that called 'longPkFromDatabase' will never be left without a PK (if say the queue is drained faster than we are able to call 'poll'), and then also we avoid an extra pair of add/poll calls.

        Show
        Andrus Adamchik added a comment - Now I fully understand the deadlock. Indeed things look pretty bad if you have a starved connection pool. A few ideas re: the new implementation: long val = longPkFromDatabase(node, entity); Queue<Long> nextPks = mkRange(val, val + cacheSize - 1); int iterations = 0; while (!pkCache.replace(entity.getName(), pks, nextPks) && iterations < 999) { pks = pkCache.get(entity.getName()); // the cache for this entity has changed, so re-fetch it then update Queue<Long> previousPlusNext = new ConcurrentLinkedQueue<Long>(pks); previousPlusNext.addAll(nextPks); nextPks = previousPlusNext; iterations++; } if (iterations >= 999) { throw new IllegalStateException("Unable to add new primary keys to the cache for entity " + entity.getName()); } pks = nextPks; } value = pks.poll(); 1. Since we have a concurrent queue, do we have to clone and replace it every time? I think when we get a new range of keys, we can simply append it to the end of the existing queue (based on the earlier assertion that the order of keys is irrelevant). 2. The thread that hit 'longPkFromDatabase', can actually immediately grab the returned value for itself, and then append to the queue the remaining part of the range. This way the thread that called 'longPkFromDatabase' will never be left without a PK (if say the queue is drained faster than we are able to call 'poll'), and then also we avoid an extra pair of add/poll calls.
        Hide
        John Huss added a comment - - edited

        You're right, good ideas. Here's that portion reimplemented:

        Queue<Long> pks = pkCache.get(entity.getName());

        if (pks == null) {
        // created exhausted LongPkRange
        pks = new ConcurrentLinkedQueue<Long>();
        Queue<Long> previousPks = pkCache.putIfAbsent(entity.getName(), pks);
        if (previousPks != null)

        { pks = previousPks; }

        }

        if (pks.isEmpty()) {
        value = longPkFromDatabase(node, entity);
        for (long i = value+1; i < value + cacheSize; i++)

        { pks.add(i); }

        } else

        { value = pks.poll(); }
        Show
        John Huss added a comment - - edited You're right, good ideas. Here's that portion reimplemented: Queue<Long> pks = pkCache.get(entity.getName()); if (pks == null) { // created exhausted LongPkRange pks = new ConcurrentLinkedQueue<Long>(); Queue<Long> previousPks = pkCache.putIfAbsent(entity.getName(), pks); if (previousPks != null) { pks = previousPks; } } if (pks.isEmpty()) { value = longPkFromDatabase(node, entity); for (long i = value+1; i < value + cacheSize; i++) { pks.add(i); } } else { value = pks.poll(); }
        Hide
        Andrus Adamchik added a comment - - edited

        IMO looks great. Minor reorg suggestion of the second if/else:

        value = pks.poll();
        if (value == null) {
        value = longPkFromDatabase(node, entity);
        for (long i = value+1; i < value + cacheSize; i++)

        { pks.add(i); }

        }

        This is possible due to the queue.poll() behavior: "Retrieves and removes the head of this queue, or returns null if this queue is empty. " Although that would require us to redefine 'value' as Long instead of long.

        Show
        Andrus Adamchik added a comment - - edited IMO looks great. Minor reorg suggestion of the second if/else: value = pks.poll(); if (value == null) { value = longPkFromDatabase(node, entity); for (long i = value+1; i < value + cacheSize; i++) { pks.add(i); } } This is possible due to the queue.poll() behavior: "Retrieves and removes the head of this queue, or returns null if this queue is empty. " Although that would require us to redefine 'value' as Long instead of long.
        Hide
        Brian Dickinson added a comment -

        I am just making a note that revision 1426022 for this issue isn't visible on the Source tab.

        Show
        Brian Dickinson added a comment - I am just making a note that revision 1426022 for this issue isn't visible on the Source tab.
        Hide
        Andrus Adamchik added a comment -

        @Brian Dickinson: Are you checking Cayenne sources via GitHub ? It is certainly in SVN. I am not sure about Git synchronization policies of Apache repos though.

        Show
        Andrus Adamchik added a comment - @Brian Dickinson: Are you checking Cayenne sources via GitHub ? It is certainly in SVN. I am not sure about Git synchronization policies of Apache repos though.
        Hide
        Brian Dickinson added a comment -

        I found it via SVN but it is not on the Source tab from this page. It is just missing a dash (CAY 1782) instead of (CAY-1782) in the comment. I just thought I'd make a note so someone doesn't miss the change if they are merging.

        Show
        Brian Dickinson added a comment - I found it via SVN but it is not on the Source tab from this page. It is just missing a dash (CAY 1782) instead of ( CAY-1782 ) in the comment. I just thought I'd make a note so someone doesn't miss the change if they are merging.
        Hide
        Andrus Adamchik added a comment -

        Got it now. Thanks!

        Show
        Andrus Adamchik added a comment - Got it now. Thanks!

          People

          • Assignee:
            John Huss
            Reporter:
            John Huss
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:

              Development