Commons Dbcp
  1. Commons Dbcp
  2. DBCP-212

PoolingDataSource closes physical connections

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.2
    • Fix Version/s: 1.3
    • Labels:
      None
    • Environment:

      Windows XP, Java 1.5.0_06-b05, Sybase ASE 12.5.4, jConnect 6.0.5 EBF 13862, Commons Pool 1.3

      Description

      By executing the attached program and monitoring the process id of the physical connections at the database server, it is possible to demonstrate that the connections are being actually physically closed and reopened by the application at a very high rate.

      1. DBCPtester.java
        2 kB
        Marcos Sanz
      2. DBCPtester.java
        2 kB
        Marcos Sanz
      3. output.txt
        19 kB
        Marcos Sanz

        Issue Links

          Activity

          Hide
          Mark Thomas added a comment -

          The numerous sync changes made in POOL 1.5.x should eliminate the remaining issues reported here. In particular:

          • no factory method is called from within a sync block
          • borrowObject() uses syncs only where necessary
          • the implementation of fair queuing ensures that an object will be allocated to a waiting thread when it is returned

          DBCP 1.3 will include POOL 1.5.1. In the mean time, you should still see most of the benefit by switching to POOL 1.5.1 or later

          Show
          Mark Thomas added a comment - The numerous sync changes made in POOL 1.5.x should eliminate the remaining issues reported here. In particular: no factory method is called from within a sync block borrowObject() uses syncs only where necessary the implementation of fair queuing ensures that an object will be allocated to a waiting thread when it is returned DBCP 1.3 will include POOL 1.5.1. In the mean time, you should still see most of the benefit by switching to POOL 1.5.1 or later
          Hide
          Mark Thomas added a comment -

          I'm changing the status of this to enhancement as the current code is working as designed, even if it could be smarter.

          If GenericObjectPool.addObjectToPool took account of the number of waiting threads then this issue would be easy to solve. That number isn't currently available but if POOL-75 is implemented then the number of waiting threads could be exposed.

          Show
          Mark Thomas added a comment - I'm changing the status of this to enhancement as the current code is working as designed, even if it could be smarter. If GenericObjectPool.addObjectToPool took account of the number of waiting threads then this issue would be easy to solve. That number isn't currently available but if POOL-75 is implemented then the number of waiting threads could be exposed.
          Hide
          Mark Thomas added a comment -

          I have to add the sync back in - without it, driver registration was broken. Once I get to the bottom of why, I'll take the sync back out if I can.

          Show
          Mark Thomas added a comment - I have to add the sync back in - without it, driver registration was broken. Once I get to the bottom of why, I'll take the sync back out if I can.
          Hide
          Mark Thomas added a comment -

          I can't see a good reason for the sync on makeObject() so I removed it. It does seem to do a little to alleviate the problem but the issue is still significant.

          Show
          Mark Thomas added a comment - I can't see a good reason for the sync on makeObject() so I removed it. It does seem to do a little to alleviate the problem but the issue is still significant.
          Hide
          Marcos Sanz added a comment -

          Not really fixed. As Phil wrote: "until pool is improved somehow to be more intelligent about destroying idle objects, dbcp is prone to this behavior".
          I'd like to keep this open so as not to forget this. The question remains: Has PoolableConnectionFactory.makeObject() necessarily be synchronized?

          Show
          Marcos Sanz added a comment - Not really fixed. As Phil wrote: "until pool is improved somehow to be more intelligent about destroying idle objects, dbcp is prone to this behavior". I'd like to keep this open so as not to forget this. The question remains: Has PoolableConnectionFactory.makeObject() necessarily be synchronized?
          Hide
          Dain Sundstrom added a comment -

          Fixed some time ago.

          Show
          Dain Sundstrom added a comment - Fixed some time ago.
          Hide
          Phil Steitz added a comment -

          PoolableConnectionFactory.makeObject() has been synchronized since the initial commit. Synchronization of validateObject was removed in r132110 in response to BZ 25096. Can anyone else see a reason that this synchronization is necessary?

          Show
          Phil Steitz added a comment - PoolableConnectionFactory.makeObject() has been synchronized since the initial commit. Synchronization of validateObject was removed in r132110 in response to BZ 25096. Can anyone else see a reason that this synchronization is necessary?
          Hide
          Marcos Sanz added a comment -

          Regarding concurrency control of GenericObjectPool, I have submitted a patch (see issue POOL-93) that improves the borrow/return times of the pool. DBCP-212 is somehow hereby alleviated, but it isn't still fixed.

          It was not worth tweaking GenericObjectPool.borrowObject() any further because at the moment the next bottleneck of the test program is DBCP's method PoolableConnectionFactory.makeObject() (which is called from within borrowObject()): When the pool reaches a certain size and needs to be increased, all threads get stack at the factory entrance. Is there any compelling reason for that method to be synchronized?

          Show
          Marcos Sanz added a comment - Regarding concurrency control of GenericObjectPool, I have submitted a patch (see issue POOL-93 ) that improves the borrow/return times of the pool. DBCP-212 is somehow hereby alleviated, but it isn't still fixed. It was not worth tweaking GenericObjectPool.borrowObject() any further because at the moment the next bottleneck of the test program is DBCP's method PoolableConnectionFactory.makeObject() (which is called from within borrowObject()): When the pool reaches a certain size and needs to be increased, all threads get stack at the factory entrance. Is there any compelling reason for that method to be synchronized?
          Hide
          Marcos Sanz added a comment -

          Yes, I also have the feeling that DBCP-205 is an aspect of this issue, appearing because of exhaustion of ephimeral ports on the client side due to the masive number of operations on the physical connections.

          Show
          Marcos Sanz added a comment - Yes, I also have the feeling that DBCP-205 is an aspect of this issue, appearing because of exhaustion of ephimeral ports on the client side due to the masive number of operations on the physical connections.
          Hide
          Phil Steitz added a comment -

          Thanks, Marcos. Really appreciate your feedback and help chasing this down. I should have thought a little more about what is going on here before responding. What I now think is happening is that either as a result of contention for locks on the pool (via getMaxActive) or buffer flushing in System.out, the active threads are periodically closing connections faster than they are (collectively) opening them. This makes the number of idle connections periodically exceed maxIdle, they get closed and then load increases again, so new ones get created. This could happen in production for lots of different reasons.

          I am leaving this open as a dbcp bug, since until pool is improved somehow to be more intelligent about destroying idle objects, dbcp is prone to this behavior. The workaround in situations where load and open/close rates can flucutate is to set maxIdle = maxActive, or at least to avoid maxIdle << maxActive.

          The documentation of both pool and dbcp should be improved to call out the fact that the default value of maxIdle is 8 and what this setting really means.

          I am also marking this as related to DBCP-205 (maybe a duplicate?). The same Windows socket error was reported there, in that case also associated with connection churn, and the resolution of that issue requires the same kind of enhancement to pool.

          Show
          Phil Steitz added a comment - Thanks, Marcos. Really appreciate your feedback and help chasing this down. I should have thought a little more about what is going on here before responding. What I now think is happening is that either as a result of contention for locks on the pool (via getMaxActive) or buffer flushing in System.out, the active threads are periodically closing connections faster than they are (collectively) opening them. This makes the number of idle connections periodically exceed maxIdle, they get closed and then load increases again, so new ones get created. This could happen in production for lots of different reasons. I am leaving this open as a dbcp bug, since until pool is improved somehow to be more intelligent about destroying idle objects, dbcp is prone to this behavior. The workaround in situations where load and open/close rates can flucutate is to set maxIdle = maxActive, or at least to avoid maxIdle << maxActive. The documentation of both pool and dbcp should be improved to call out the fact that the default value of maxIdle is 8 and what this setting really means. I am also marking this as related to DBCP-205 (maybe a duplicate?). The same Windows socket error was reported there, in that case also associated with connection churn, and the resolution of that issue requires the same kind of enhancement to pool.
          Hide
          Marcos Sanz added a comment -

          Not true. Even if I drop the System.out call and maintain the getMaxActive(), the situation persists. I still can observe the physical closes, you just have to wait a bit longer (in the order of minutes for my system, but YMMV) for the situation to appear. And if you wait a bit longer, exceptions start occurring (see attachment).

          The concurrency control concept and, as a result, the liveness of the GenericObjectPool is catastrophical. And I am not talking only about the synchronized counter accesses. How comes that expensive (database) operations like makeObject() are locking the instance of the pool when this is not needed at all?

          Show
          Marcos Sanz added a comment - Not true. Even if I drop the System.out call and maintain the getMaxActive(), the situation persists. I still can observe the physical closes, you just have to wait a bit longer (in the order of minutes for my system, but YMMV) for the situation to appear. And if you wait a bit longer, exceptions start occurring (see attachment). The concurrency control concept and, as a result, the liveness of the GenericObjectPool is catastrophical. And I am not talking only about the synchronized counter accesses. How comes that expensive (database) operations like makeObject() are locking the instance of the pool when this is not needed at all?
          Hide
          Phil Steitz added a comment -

          Thanks. By adding the System.out, I can reproduce the symptoms. You are correct that it is the pool's returnObject method that is triggering closing of the excess idle physical connections. The default maxIdle for the pool is 8. I think that what is causing the threads to "stack up", however, is not contention for locks on the pool, but contention on the relatively slow and synchronized System.out.println(). When I replace System.out.println(connectionPool.getMaxActive()) with just connectionPool.getMaxActive(), the idle connection count to does not grow. On the other hand, replacing it with System.out.println("hello") leads to the idle object build up.

          Show
          Phil Steitz added a comment - Thanks. By adding the System.out, I can reproduce the symptoms. You are correct that it is the pool's returnObject method that is triggering closing of the excess idle physical connections. The default maxIdle for the pool is 8. I think that what is causing the threads to "stack up", however, is not contention for locks on the pool, but contention on the relatively slow and synchronized System.out.println(). When I replace System.out.println(connectionPool.getMaxActive()) with just connectionPool.getMaxActive(), the idle connection count to does not grow. On the other hand, replacing it with System.out.println("hello") leads to the idle object build up.
          Hide
          Marcos Sanz added a comment -

          Indeed, the problem is triggered by the synchronized methods in GenericObjectPool, like getNumActive() and getNumIdle(), often called by applications. These methods serialize all executing threads to a single code point, thus the number of unused objects in the pool increases at some moments and eventually reaches the maxIdle value, which triggers the destroyObject(), causing physical closes() of database connections.

          This is very cumbersome, and nearly a paradox: there are indeed one thousand threads in the example concurring for fifteen database connections, yet the pool decides the connections are idle.

          One solution to this problem can be the application always setting maxIdle := maxActive, though this has side effects. I am wondering whether it could be possible to make these private fields (_maxActive, _numActive, ...) volatile and get rid of the synchronized in the methods. I have to explore this. In the meantime, I think this issue belongs to Commons-Pool, rather than to Commons-DBCP.

          Show
          Marcos Sanz added a comment - Indeed, the problem is triggered by the synchronized methods in GenericObjectPool, like getNumActive() and getNumIdle(), often called by applications. These methods serialize all executing threads to a single code point, thus the number of unused objects in the pool increases at some moments and eventually reaches the maxIdle value, which triggers the destroyObject(), causing physical closes() of database connections. This is very cumbersome, and nearly a paradox: there are indeed one thousand threads in the example concurring for fifteen database connections, yet the pool decides the connections are idle. One solution to this problem can be the application always setting maxIdle := maxActive, though this has side effects. I am wondering whether it could be possible to make these private fields (_maxActive, _numActive, ...) volatile and get rid of the synchronized in the methods. I have to explore this. In the meantime, I think this issue belongs to Commons-Pool, rather than to Commons-DBCP.
          Hide
          Marcos Sanz added a comment -

          New file with additional debugging that actually causes the bug.

          Show
          Marcos Sanz added a comment - New file with additional debugging that actually causes the bug.
          Hide
          Marcos Sanz added a comment -

          The issue can be reproduced with pool 1.2/dbcp 1.2.1/collections 3.2. It can also be reproduced with another driver different of jConnect.

          I've isolated the source of the problem. It is the debugging line

          System.out.println(connectionPool.getMaxActive());

          which I had in my original code (see new attachment).

          It looks as if the synchronized in GenericObjectPool.getMaxActive() method would be originating the physical opens/closes, but I still don't know why.

          Show
          Marcos Sanz added a comment - The issue can be reproduced with pool 1.2/dbcp 1.2.1/collections 3.2. It can also be reproduced with another driver different of jConnect. I've isolated the source of the problem. It is the debugging line System.out.println(connectionPool.getMaxActive()); which I had in my original code (see new attachment). It looks as if the synchronized in GenericObjectPool.getMaxActive() method would be originating the physical opens/closes, but I still don't know why.
          Hide
          Phil Steitz added a comment -

          After running against instrumented dbcp and pool jars and different db engines, I still can't replicate this problem. Both client and server logs are showing only the expected number of opens and closes on physical connections. I suspect that the connections must either being closed on the server side or somehow JConnect's isClosed impl is returning true when the connection is not closed (though it is working for me against SQL Anywhere). One more thing to try is setting IS_CLOSED_TEST=INTERNAL.

          Show
          Phil Steitz added a comment - After running against instrumented dbcp and pool jars and different db engines, I still can't replicate this problem. Both client and server logs are showing only the expected number of opens and closes on physical connections. I suspect that the connections must either being closed on the server side or somehow JConnect's isClosed impl is returning true when the connection is not closed (though it is working for me against SQL Anywhere). One more thing to try is setting IS_CLOSED_TEST=INTERNAL.
          Hide
          Phil Steitz added a comment -

          Thanks. Also thanks in advance for running against pool 1.2/dbcp 1.2.1 to verify same behavior.
          The log I examined for Sybase was the server log with the (SQL Anywhere 10) server started with the -zr option set to "SQL" or "ALL". That appears to show connects / disconnects, as well as sp execution.
          I don't know much about Sybase, but from the docs, it looks like if you can snoop the TDS traffic, you see much more than I can from the logs.
          All my tests were on Linux (Fedora Core 4) Sun JDK 1.5.0_10, JConnect 6.0.5.
          I also ran against postgres, which spawns an OS process for each connection, and verified from its logs as well as from ps output that there was no process churn (PIDs did not change).

          Show
          Phil Steitz added a comment - Thanks. Also thanks in advance for running against pool 1.2/dbcp 1.2.1 to verify same behavior. The log I examined for Sybase was the server log with the (SQL Anywhere 10) server started with the -zr option set to "SQL" or "ALL". That appears to show connects / disconnects, as well as sp execution. I don't know much about Sybase, but from the docs, it looks like if you can snoop the TDS traffic, you see much more than I can from the logs. All my tests were on Linux (Fedora Core 4) Sun JDK 1.5.0_10, JConnect 6.0.5. I also ran against postgres, which spawns an OS process for each connection, and verified from its logs as well as from ps output that there was no process churn (PIDs did not change).
          Hide
          Marcos Sanz added a comment -

          Yes, the test has been made with Commons Pool 1.3

          It is possible to determine that connections are being physically closed/reopened in that the process IDs and the login dates at the master..sysprocesses table in the database server are continuously changing for this application's connections. The number of connections is not stable = 15 either, it varies between 12-15.

          I'll rerun the tests on Tuesday with older versions of pool/dbcp. I'll try to capture some TDS traffic with RIBO, as well.

          You wrote "I examined the connection logs". Is there some debug-switch I can turn on and that would help you debugging?

          Show
          Marcos Sanz added a comment - Yes, the test has been made with Commons Pool 1.3 It is possible to determine that connections are being physically closed/reopened in that the process IDs and the login dates at the master..sysprocesses table in the database server are continuously changing for this application's connections. The number of connections is not stable = 15 either, it varies between 12-15. I'll rerun the tests on Tuesday with older versions of pool/dbcp. I'll try to capture some TDS traffic with RIBO, as well. You wrote "I examined the connection logs". Is there some debug-switch I can turn on and that would help you debugging?
          Hide
          Phil Steitz added a comment -

          Can you please provide a little more information on what exactly you are seeing - how exactly you can tell that the connections are being physically closed?

          I ran the DBCPTester code against postgres and Sybase SQL Anywhere (developer edition, with the JConnect driver above) and did not see connection leakage or churn. In each case, I examined connection logs and there did not appear to be connect-disconnects happening. Postgres did not churn processes and the logs showed just 15 connects, and disconnects only when the program was terminated. Same for Sybase. With Sybase, I did notice a lot of executions of sp_dba. This is because the pool periodically checks isClosed(). The JConnect docs (http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc39001_0605/html/prjdbc/BHCIAIBF.htm) explain why this triggers sp_dba. Specifying IS_CLOSED_TEST=INTERNAL eliminates this.

          Are you running Commons Pool 1.3? With dbcp 1.2.2, you should be. Could you pls also test using pool 1.2 with dbcp 1.2.1? I don't see any difference in my tests using the previous versions and the changes from pool 1.2 / dbcp 1.2.1 to pool 1.3 / dbcp 1.2.2 should not affect this; but it would be good to know if the behavior you are seeing is new.

          Thanks!

          Show
          Phil Steitz added a comment - Can you please provide a little more information on what exactly you are seeing - how exactly you can tell that the connections are being physically closed? I ran the DBCPTester code against postgres and Sybase SQL Anywhere (developer edition, with the JConnect driver above) and did not see connection leakage or churn. In each case, I examined connection logs and there did not appear to be connect-disconnects happening. Postgres did not churn processes and the logs showed just 15 connects, and disconnects only when the program was terminated. Same for Sybase. With Sybase, I did notice a lot of executions of sp_dba. This is because the pool periodically checks isClosed(). The JConnect docs ( http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc39001_0605/html/prjdbc/BHCIAIBF.htm ) explain why this triggers sp_dba. Specifying IS_CLOSED_TEST=INTERNAL eliminates this. Are you running Commons Pool 1.3? With dbcp 1.2.2, you should be. Could you pls also test using pool 1.2 with dbcp 1.2.1? I don't see any difference in my tests using the previous versions and the changes from pool 1.2 / dbcp 1.2.1 to pool 1.3 / dbcp 1.2.2 should not affect this; but it would be good to know if the behavior you are seeing is new. Thanks!
          Hide
          Marcos Sanz added a comment -

          Execution of the programm does NOT produce any exception, error or relevant output.

          Show
          Marcos Sanz added a comment - Execution of the programm does NOT produce any exception, error or relevant output.

            People

            • Assignee:
              Unassigned
              Reporter:
              Marcos Sanz
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development