Commons Pool
  1. Commons Pool
  2. POOL-209

SharedPoolDataSource does not handle null keys correctly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      Windows, java version "1.6.0_06", Torque 3.3 (DBCP 1.4, jTDS 1.2.5)

      Description

      GenericKeyedObjectPool.borrowObject does not invalidate object if validateObject fails (line ~1200). I guess invalidation would be needed for objects which are not newly created (newlyCreated flag in the code).

      Scenario:
      1.) DB connection put into pool
      2.) Attempt to reuse connection (borrowObject)
      3.) Test on borrow (executing simple SQL query) fails: DB connection closed on jTDS level but connection remains in the pool
      4.) Subsequent attempts to reuse the connection causes "java.sql.SQLException: Invalid state, the Connection object is closed"

        Activity

        Hide
        Phil Steitz added a comment -

        Thanks for having a looking into the code, but I think your problem is likely somewhere else. When an object borrowed from the pool fails validation it is destroyed and not returned to the pool. The code is a little complicated, but the object being examined in lines 1195-1228 of the 1.5.4 source is either newly created or has been removed from the pool and assigned to the client thread (i.e. it is no longer in the idle object pool, so will not be returned to the current thread or any other client until it passes validation). The actual removal from the pool happens in the allocate method, where the instance is attached to the latch associated with the client thread.

        Can you please provide a stack trace of the exception above?

        Show
        Phil Steitz added a comment - Thanks for having a looking into the code, but I think your problem is likely somewhere else. When an object borrowed from the pool fails validation it is destroyed and not returned to the pool. The code is a little complicated, but the object being examined in lines 1195-1228 of the 1.5.4 source is either newly created or has been removed from the pool and assigned to the client thread (i.e. it is no longer in the idle object pool, so will not be returned to the current thread or any other client until it passes validation). The actual removal from the pool happens in the allocate method, where the instance is attached to the latch associated with the client thread. Can you please provide a stack trace of the exception above?
        Hide
        Gabor Horvath added a comment -

        Thanks for the quick answer.
        Yes, you are right, I also checked the allocate() method. I tried to test where the problem comes from; not sure, however it seems all starts with this validity failure (I just put a printStackTrace into dbcp code):

        java.sql.SQLException: PooledConnection was reused, withoutits previous Connection being closed.
        at org.apache.commons.dbcp.cpdsadapter.PooledConnectionImpl.getConnection(PooledConnectionImpl.java:180)
        at org.apache.commons.dbcp.datasources.KeyedCPDSConnectionFactory.validateObject(KeyedCPDSConnectionFactory.java:180)
        at org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:1197)
        at org.apache.commons.dbcp.datasources.SharedPoolDataSource.getPooledConnectionAndInfo(SharedPoolDataSource.java:181)
        at org.apache.commons.dbcp.datasources.InstanceKeyDataSource.getConnection(InstanceKeyDataSource.java:701)
        at org.apache.commons.dbcp.datasources.InstanceKeyDataSource.getConnection(InstanceKeyDataSource.java:676)
        at org.apache.torque.TorqueInstance.getConnection(TorqueInstance.java:912)
        at org.apache.torque.Torque.getConnection(Torque.java:314)
        at org.apache.torque.util.Transaction.beginOptional(Transaction.java:108)
        at org.apache.torque.util.BasePeer.doSelect(BasePeer.java:753)
        ...
        at java.lang.Thread.run(Thread.java:619)

        And in a little while:

        org.apache.torque.TorqueException: java.sql.SQLException: Invalid state, the Connection object is closed.
        at org.apache.torque.util.Transaction.commit(Transaction.java:153)
        at org.apache.torque.util.BasePeer.doSelect(BasePeer.java:757)
        ...
        at java.lang.Thread.run(Thread.java:619)
        Caused by: java.sql.SQLException: Invalid state, the Connection object is closed.
        at net.sourceforge.jtds.jdbc.ConnectionJDBC2.checkOpen(ConnectionJDBC2.java:1699)
        at net.sourceforge.jtds.jdbc.ConnectionJDBC2.getMetaData(ConnectionJDBC2.java:2293)
        at org.apache.commons.dbcp.DelegatingConnection.getMetaData(DelegatingConnection.java:346)
        at org.apache.torque.util.Transaction.commit(Transaction.java:144)
        ... 7 more

        Show
        Gabor Horvath added a comment - Thanks for the quick answer. Yes, you are right, I also checked the allocate() method. I tried to test where the problem comes from; not sure, however it seems all starts with this validity failure (I just put a printStackTrace into dbcp code): java.sql.SQLException: PooledConnection was reused, withoutits previous Connection being closed. at org.apache.commons.dbcp.cpdsadapter.PooledConnectionImpl.getConnection(PooledConnectionImpl.java:180) at org.apache.commons.dbcp.datasources.KeyedCPDSConnectionFactory.validateObject(KeyedCPDSConnectionFactory.java:180) at org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:1197) at org.apache.commons.dbcp.datasources.SharedPoolDataSource.getPooledConnectionAndInfo(SharedPoolDataSource.java:181) at org.apache.commons.dbcp.datasources.InstanceKeyDataSource.getConnection(InstanceKeyDataSource.java:701) at org.apache.commons.dbcp.datasources.InstanceKeyDataSource.getConnection(InstanceKeyDataSource.java:676) at org.apache.torque.TorqueInstance.getConnection(TorqueInstance.java:912) at org.apache.torque.Torque.getConnection(Torque.java:314) at org.apache.torque.util.Transaction.beginOptional(Transaction.java:108) at org.apache.torque.util.BasePeer.doSelect(BasePeer.java:753) ... at java.lang.Thread.run(Thread.java:619) And in a little while: org.apache.torque.TorqueException: java.sql.SQLException: Invalid state, the Connection object is closed. at org.apache.torque.util.Transaction.commit(Transaction.java:153) at org.apache.torque.util.BasePeer.doSelect(BasePeer.java:757) ... at java.lang.Thread.run(Thread.java:619) Caused by: java.sql.SQLException: Invalid state, the Connection object is closed. at net.sourceforge.jtds.jdbc.ConnectionJDBC2.checkOpen(ConnectionJDBC2.java:1699) at net.sourceforge.jtds.jdbc.ConnectionJDBC2.getMetaData(ConnectionJDBC2.java:2293) at org.apache.commons.dbcp.DelegatingConnection.getMetaData(DelegatingConnection.java:346) at org.apache.torque.util.Transaction.commit(Transaction.java:144) ... 7 more
        Hide
        Phil Steitz added a comment -

        Thanks for digging into this further. The top stack trace is troubling and may point to a dbcp bug. I don't see anything wrong with the pool part of this, so I moved the issue to dbcp. The torque client is calling getConnection on SharedPoolDataSource with no username or password, which means a <null, null> key (need to look into the torque code to verify that this is normal and what it means). In dbcp 1.4, as part of the changes to support password changes (DBCP-8), equality among UserPassKeys was changed to identify keys with the same username. I don't yet have an explanation for the trace. I will dig into the torque code this weekend and see if I can figure our what is going on.

        Show
        Phil Steitz added a comment - Thanks for digging into this further. The top stack trace is troubling and may point to a dbcp bug. I don't see anything wrong with the pool part of this, so I moved the issue to dbcp. The torque client is calling getConnection on SharedPoolDataSource with no username or password, which means a <null, null> key (need to look into the torque code to verify that this is normal and what it means). In dbcp 1.4, as part of the changes to support password changes ( DBCP-8 ), equality among UserPassKeys was changed to identify keys with the same username. I don't yet have an explanation for the trace. I will dig into the torque code this weekend and see if I can figure our what is going on.
        Hide
        Gabor Horvath added a comment -

        I went deeper into the commons-pool code and found problematic points:

        1.) Creating new object for latch is not synchronized, so there is a probability that other thread overrides the latch pair (GenericKeyedObjectPool:1177)
        2.) Allocate() method does not care if the latch pair is already assigned in 1.) (~1245).
        3.) I guess latch accesses should be synchronized to avoid latch changes between lines.

        I attached my changes, please review if you have time. My tests are passed now.
        Thanks a lot.

        Show
        Gabor Horvath added a comment - I went deeper into the commons-pool code and found problematic points: 1.) Creating new object for latch is not synchronized, so there is a probability that other thread overrides the latch pair (GenericKeyedObjectPool:1177) 2.) Allocate() method does not care if the latch pair is already assigned in 1.) (~1245). 3.) I guess latch accesses should be synchronized to avoid latch changes between lines. I attached my changes, please review if you have time. My tests are passed now. Thanks a lot.
        Hide
        Mark Thomas added a comment -

        -1 for the patch has currently written. It includes calls to factory methods inside sync blocks which will cause deadlocks as documented in the comments at the top of the class.

        It would help reviewers if you provided a little more detail regarding the sequence of events that can lead to a threading problem for issues 1) and 2). It may also help to provide separate patches to address them. As for 3) I will veto any patch based on a 'guess' that there might be a problem or that guesses at what the problem might be. Patches need to be backed up with hard evidence.

        The code for this and GenericObjectPool is very similar. If there are issues here, there are probably issue sin GOP that also need to be patched.

        For future reference, patches should be provided in diff -u format as this requires significantly less effort to review.

        Show
        Mark Thomas added a comment - -1 for the patch has currently written. It includes calls to factory methods inside sync blocks which will cause deadlocks as documented in the comments at the top of the class. It would help reviewers if you provided a little more detail regarding the sequence of events that can lead to a threading problem for issues 1) and 2). It may also help to provide separate patches to address them. As for 3) I will veto any patch based on a 'guess' that there might be a problem or that guesses at what the problem might be. Patches need to be backed up with hard evidence. The code for this and GenericObjectPool is very similar. If there are issues here, there are probably issue sin GOP that also need to be patched. For future reference, patches should be provided in diff -u format as this requires significantly less effort to review.
        Hide
        Gabor Horvath added a comment -

        Thanks for checking. I should have really considered the threading issues, I didn't catch the related comments in header...

        Just in nutshell, I had problem with connection pooling. Above layers get connections in invalid state; quite difficult to reproduce this symptom but happens under heavy load. I will try to provide a test application maybe next week.

        For 1 and 2 I just checked the code and found that part where new object is created without synchronization; this can have sideeffect in case the latch remains in the allocation queue. As I saw chance for that by tracing, I synchronized it and also changed the allocate method to handle the situation if an object is already assigned to the latch... I try to create patches for these.

        Regarding 3.) you are totally right.

        Show
        Gabor Horvath added a comment - Thanks for checking. I should have really considered the threading issues, I didn't catch the related comments in header... Just in nutshell, I had problem with connection pooling. Above layers get connections in invalid state; quite difficult to reproduce this symptom but happens under heavy load. I will try to provide a test application maybe next week. For 1 and 2 I just checked the code and found that part where new object is created without synchronization; this can have sideeffect in case the latch remains in the allocation queue. As I saw chance for that by tracing, I synchronized it and also changed the allocate method to handle the situation if an object is already assigned to the latch... I try to create patches for these. Regarding 3.) you are totally right.
        Hide
        Phil Steitz added a comment -

        Gabor: can you post some more info on the DBCP config - in particular, is testOnReturn false, how are maxIdle, minIdle set, is the evictor enabled, and are you using DriverAdaptorCPDS (looks like yes)? Thanks!

        Show
        Phil Steitz added a comment - Gabor: can you post some more info on the DBCP config - in particular, is testOnReturn false, how are maxIdle, minIdle set, is the evictor enabled, and are you using DriverAdaptorCPDS (looks like yes)? Thanks!
        Hide
        Gabor Horvath added a comment -

        Hi, I use torque, the pool related configurations are:

        torque.dsfactory.user.pool.maxActive=-1
        torque.dsfactory.user.pool.maxIdle=-1
        torque.dsfactory.user.pool.timeBetweenEvictionRunsMillis=30000
        torque.dsfactory.user.pool.minEvictableIdleTimeMillis=60000
        torque.dsfactory.user.pool.testOnBorrow=true
        torque.dsfactory.user.pool.testWhileIdle=false

        Other parameters are not specified in my configuration so these should be set to DBCP default (testOnReturn=false, minIdle=0, numTestsPerEvictionRun=3, ...). And yes DriverAdapterCPDS is used via torque. Thanks a lot!

        Show
        Gabor Horvath added a comment - Hi, I use torque, the pool related configurations are: torque.dsfactory.user.pool.maxActive=-1 torque.dsfactory.user.pool.maxIdle=-1 torque.dsfactory.user.pool.timeBetweenEvictionRunsMillis=30000 torque.dsfactory.user.pool.minEvictableIdleTimeMillis=60000 torque.dsfactory.user.pool.testOnBorrow=true torque.dsfactory.user.pool.testWhileIdle=false Other parameters are not specified in my configuration so these should be set to DBCP default (testOnReturn=false, minIdle=0, numTestsPerEvictionRun=3, ...). And yes DriverAdapterCPDS is used via torque. Thanks a lot!
        Hide
        Gabor Horvath added a comment -

        I attached a new version of GenericKeyedObjectPool; all provided (243) unit tests passed with success. Sorry for not providing diff, currently I do not have it installed.

        BorrowObject and allocate changed as follows:

        -Latch is used for synchronization in code blocks where latch's mutable members are accessed.
        -'this' used for synchronizations where allocation queue is accessed.
        -Factory method calls are now outside of synchronized blocks: latch is removed from allocation queue before calling the factory methods to avoid concurrent modifications.

        I would highly appreciate your review. Thanks.

        Show
        Gabor Horvath added a comment - I attached a new version of GenericKeyedObjectPool; all provided (243) unit tests passed with success. Sorry for not providing diff, currently I do not have it installed. BorrowObject and allocate changed as follows: -Latch is used for synchronization in code blocks where latch's mutable members are accessed. -'this' used for synchronizations where allocation queue is accessed. -Factory method calls are now outside of synchronized blocks: latch is removed from allocation queue before calling the factory methods to avoid concurrent modifications. I would highly appreciate your review. Thanks.
        Hide
        Phil Steitz added a comment -

        Thanks, Gabor. It's hard to evaluate your patch as attached for three reasons.

        1) It includes quite a few changes and is not in diff format. It would make it much easier for us if you could install the subversion client, checkout the code in trunk, and generate a patch using "svn diff" against your checkout. See the instructions here: http://commons.apache.org/patches.html and do not hesitate to ask privately or on the mailing list if you need help getting set up.

        2) It would help to explain in more detail why each of the changes is necessary - e.g., the change to add/remove the latch to/from the allocation queue each time through the loop.

        3) It would really help to have a test case that fails before the patch and succeeds after. Absent this, a clear description of an execution sequence that leads to a problem is necessary. Ideally, this explanation should explain your production stack traces.

        Thanks again for reporting the problem that you are seeing and for your suggestions on improving the code.

        Show
        Phil Steitz added a comment - Thanks, Gabor. It's hard to evaluate your patch as attached for three reasons. 1) It includes quite a few changes and is not in diff format. It would make it much easier for us if you could install the subversion client, checkout the code in trunk, and generate a patch using "svn diff" against your checkout. See the instructions here: http://commons.apache.org/patches.html and do not hesitate to ask privately or on the mailing list if you need help getting set up. 2) It would help to explain in more detail why each of the changes is necessary - e.g., the change to add/remove the latch to/from the allocation queue each time through the loop. 3) It would really help to have a test case that fails before the patch and succeeds after. Absent this, a clear description of an execution sequence that leads to a problem is necessary. Ideally, this explanation should explain your production stack traces. Thanks again for reporting the problem that you are seeing and for your suggestions on improving the code.
        Hide
        Gabor Horvath added a comment -

        Phil, thanks for your response. I will provide you patches inline with your process maybe on the next week.

        Please ignore source files I may upload here, these will be just memo for myself.

        Show
        Gabor Horvath added a comment - Phil, thanks for your response. I will provide you patches inline with your process maybe on the next week. Please ignore source files I may upload here, these will be just memo for myself.
        Hide
        Phil Steitz added a comment -

        Thanks. That will make it easier. As I said above, it would also be great if you could explain why each change is necessary and how exactly the current code can fail. The change I called out above - pulling the latch out of the queue each time through the loop - is problematic as it will in general not preserve fairness, which is the reason that the queue is there.

        Show
        Phil Steitz added a comment - Thanks. That will make it easier. As I said above, it would also be great if you could explain why each change is necessary and how exactly the current code can fail. The change I called out above - pulling the latch out of the queue each time through the loop - is problematic as it will in general not preserve fairness, which is the reason that the queue is there.
        Hide
        Gabor Horvath added a comment -

        Partially agree with your comment.

        At the point the latch is removed from the allocation queue we can be sure client will be served (assuming validation will be ok):
        -if allocation was successful allocated object would be returned
        -if allocation failed new object is created

        Fairness is broken only in the case if newly created object could not be validated, thus requeueing is needed - this is really a difference. For the remaining scenarios my code works the same I think: if allocation is successful the latch is removed from the queue anyway.

        My proposal to fix this problem is to define a new Latch member to show if the latch must be served or not: instead of removing latch from the queue we just set the flag which would be taken into account by allocate(). Method allocate() also would set this flag accordingly instead of removing the latch from queue. Latch would be removed only when borrowObject returns. This way fairness is maintained for the latch lifetime. What do you think?

        Thanks for the catch!

        Show
        Gabor Horvath added a comment - Partially agree with your comment. At the point the latch is removed from the allocation queue we can be sure client will be served (assuming validation will be ok): -if allocation was successful allocated object would be returned -if allocation failed new object is created Fairness is broken only in the case if newly created object could not be validated, thus requeueing is needed - this is really a difference. For the remaining scenarios my code works the same I think: if allocation is successful the latch is removed from the queue anyway. My proposal to fix this problem is to define a new Latch member to show if the latch must be served or not: instead of removing latch from the queue we just set the flag which would be taken into account by allocate(). Method allocate() also would set this flag accordingly instead of removing the latch from queue. Latch would be removed only when borrowObject returns. This way fairness is maintained for the latch lifetime. What do you think? Thanks for the catch!
        Hide
        Kevin Schmidt added a comment -

        I am seeing this issue in the 1.4 release. I use JTDS as well and am seeing the exact behavior that Gabor Horvath has reported. This appears to be a concurrency issue since it only happens under heavy server load.

        Gabor, have you found any work arounds? Are you using the patch in production?

        Configuration Options
        maxActive=200
        minEvictableIdleTimeMillis=1800000
        numTestsPerEvictionRun=5
        testOnBorrow=0
        testOnReturn=1
        testWhileIdle=1
        timeBetweenEvictionRunsMillis=60000
        whenExhaustedAction=0
        preferredTestQuery=select db_id()

        Thanks,
        Kevin

        Show
        Kevin Schmidt added a comment - I am seeing this issue in the 1.4 release. I use JTDS as well and am seeing the exact behavior that Gabor Horvath has reported. This appears to be a concurrency issue since it only happens under heavy server load. Gabor, have you found any work arounds? Are you using the patch in production? Configuration Options maxActive=200 minEvictableIdleTimeMillis=1800000 numTestsPerEvictionRun=5 testOnBorrow=0 testOnReturn=1 testWhileIdle=1 timeBetweenEvictionRunsMillis=60000 whenExhaustedAction=0 preferredTestQuery=select db_id() Thanks, Kevin
        Hide
        Greg Roberts added a comment -

        I've just started getting "Attempted to use PooledConnection after closed() was called." errors after moving to jTDS 1.2.5 and dbcp 1.4.

        I'm also using Torque 3.3 and java 1.6.

        Has there been any resolution to this problem?

        Is there a workaround/fix of some kind?

        Thanks.

        Show
        Greg Roberts added a comment - I've just started getting "Attempted to use PooledConnection after closed() was called." errors after moving to jTDS 1.2.5 and dbcp 1.4. I'm also using Torque 3.3 and java 1.6. Has there been any resolution to this problem? Is there a workaround/fix of some kind? Thanks.
        Hide
        Mark Thomas added a comment -

        If you see this issue, please also provide details of the version of commons pool that you are using.

        Show
        Mark Thomas added a comment - If you see this issue, please also provide details of the version of commons pool that you are using.
        Hide
        Mark Thomas added a comment -

        I have been reviewing the 1.5.4 and 1.5.x pool code against the 3 issues identified above:

        1.) Creating new object for latch is not synchronized, so there is a probability that other thread overrides the latch pair (GenericKeyedObjectPool:1177)
        2.) Allocate() method does not care if the latch pair is already assigned in 1.) (~1245).
        3.) I guess latch accesses should be synchronized to avoid latch changes between lines.

        re 3: I intend to ignore as per my previous comments until such time as it is substantiated with an explanation of how a problem may occur.

        re 1: I disagree with the analysis. Setting mayCreate on the latch is performed inside a sync block on the keyed pool. At the same time as mayCreate is set, the latch is removed from the allocation queue. Therefore, another thread cannot provide an object for that latch. Object creation is performed in the borrowObject() method and the latch is a local variable and is therefore only visible to a single thread. I do not see a sequence of events that could result in two threads creating an object for the same latch.

        re 2: Same argument as 1.

        All the latch getters and setters are synchronized therefore any attempt to read latch attributes will see the latest values even if the write and read are in separate threads.

        So far there have been several explanations of how this issue may occur but, as yet, none of them have stood up to scrutiny (assuming my analysis is correct - not always a valid assumption). I wouldn't be surprised if there was a pool bug here somewhere - the code is somewhat complex - but I'm going to need either a test case that demonstrates the issue or an explanation of how the problem occurs before I start changing any code.

        There have been a few fixes (although none that look relevant to this issue) since 1.5.4 so it would be good to update to the latest 1.5.x (or 1.6.x if you want generics) and repeat the tests.

        Show
        Mark Thomas added a comment - I have been reviewing the 1.5.4 and 1.5.x pool code against the 3 issues identified above: 1.) Creating new object for latch is not synchronized, so there is a probability that other thread overrides the latch pair (GenericKeyedObjectPool:1177) 2.) Allocate() method does not care if the latch pair is already assigned in 1.) (~1245). 3.) I guess latch accesses should be synchronized to avoid latch changes between lines. re 3: I intend to ignore as per my previous comments until such time as it is substantiated with an explanation of how a problem may occur. re 1: I disagree with the analysis. Setting mayCreate on the latch is performed inside a sync block on the keyed pool. At the same time as mayCreate is set, the latch is removed from the allocation queue. Therefore, another thread cannot provide an object for that latch. Object creation is performed in the borrowObject() method and the latch is a local variable and is therefore only visible to a single thread. I do not see a sequence of events that could result in two threads creating an object for the same latch. re 2: Same argument as 1. All the latch getters and setters are synchronized therefore any attempt to read latch attributes will see the latest values even if the write and read are in separate threads. So far there have been several explanations of how this issue may occur but, as yet, none of them have stood up to scrutiny (assuming my analysis is correct - not always a valid assumption). I wouldn't be surprised if there was a pool bug here somewhere - the code is somewhat complex - but I'm going to need either a test case that demonstrates the issue or an explanation of how the problem occurs before I start changing any code. There have been a few fixes (although none that look relevant to this issue) since 1.5.4 so it would be good to update to the latest 1.5.x (or 1.6.x if you want generics) and repeat the tests.
        Hide
        Mark Thomas added a comment -

        No further information provided in almost a year.

        If you still see this issue with a current Commons pool version (currently 1.5.7 or 1.6.0) please feel free to re-open this issue.

        Show
        Mark Thomas added a comment - No further information provided in almost a year. If you still see this issue with a current Commons pool version (currently 1.5.7 or 1.6.0) please feel free to re-open this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            Gabor Horvath
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development