Issue Details (XML | Word | Printable)

Key: DBCP-212
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Marcos Sanz
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Commons Dbcp

PoolingDataSource closes physical connections

Created: 05/Apr/07 03:24 PM   Updated: 21/Jun/09 05:28 PM
Return to search
Component/s: None
Affects Version/s: 1.2.2
Fix Version/s: 1.3

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File DBCPtester.java 2007-04-10 09:56 AM Marcos Sanz 2 kB
Java Source File DBCPtester.java 2007-04-05 03:28 PM Marcos Sanz 2 kB
Text File output.txt 2007-04-11 10:58 AM Marcos Sanz 19 kB
Environment: Windows XP, Java 1.5.0_06-b05, Sybase ASE 12.5.4, jConnect 6.0.5 EBF 13862, Commons Pool 1.3
Issue Links:
Blocker
 
Reference
 

Resolution Date: 21/Jun/09 05:28 PM


 Description  « Hide
By executing the attached program and monitoring the process id of the physical connections at the database server, it is possible to demonstrate that the connections are being actually physically closed and reopened by the application at a very high rate.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Marcos Sanz added a comment - 05/Apr/07 03:28 PM
Execution of the programm does NOT produce any exception, error or relevant output.

Phil Steitz added a comment - 06/Apr/07 09:27 AM
Can you please provide a little more information on what exactly you are seeing - how exactly you can tell that the connections are being physically closed?

I ran the DBCPTester code against postgres and Sybase SQL Anywhere (developer edition, with the JConnect driver above) and did not see connection leakage or churn. In each case, I examined connection logs and there did not appear to be connect-disconnects happening. Postgres did not churn processes and the logs showed just 15 connects, and disconnects only when the program was terminated. Same for Sybase. With Sybase, I did notice a lot of executions of sp_dba. This is because the pool periodically checks isClosed(). The JConnect docs (http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc39001_0605/html/prjdbc/BHCIAIBF.htm) explain why this triggers sp_dba. Specifying IS_CLOSED_TEST=INTERNAL eliminates this.

Are you running Commons Pool 1.3? With dbcp 1.2.2, you should be. Could you pls also test using pool 1.2 with dbcp 1.2.1? I don't see any difference in my tests using the previous versions and the changes from pool 1.2 / dbcp 1.2.1 to pool 1.3 / dbcp 1.2.2 should not affect this; but it would be good to know if the behavior you are seeing is new.

Thanks!


Marcos Sanz added a comment - 06/Apr/07 01:59 PM
Yes, the test has been made with Commons Pool 1.3

It is possible to determine that connections are being physically closed/reopened in that the process IDs and the login dates at the master..sysprocesses table in the database server are continuously changing for this application's connections. The number of connections is not stable = 15 either, it varies between 12-15.

I'll rerun the tests on Tuesday with older versions of pool/dbcp. I'll try to capture some TDS traffic with RIBO, as well.

You wrote "I examined the connection logs". Is there some debug-switch I can turn on and that would help you debugging?


Phil Steitz added a comment - 06/Apr/07 05:15 PM
Thanks. Also thanks in advance for running against pool 1.2/dbcp 1.2.1 to verify same behavior.
The log I examined for Sybase was the server log with the (SQL Anywhere 10) server started with the -zr option set to "SQL" or "ALL". That appears to show connects / disconnects, as well as sp execution.
I don't know much about Sybase, but from the docs, it looks like if you can snoop the TDS traffic, you see much more than I can from the logs.
All my tests were on Linux (Fedora Core 4) Sun JDK 1.5.0_10, JConnect 6.0.5.
I also ran against postgres, which spawns an OS process for each connection, and verified from its logs as well as from ps output that there was no process churn (PIDs did not change).

Phil Steitz added a comment - 09/Apr/07 07:42 AM
After running against instrumented dbcp and pool jars and different db engines, I still can't replicate this problem. Both client and server logs are showing only the expected number of opens and closes on physical connections. I suspect that the connections must either being closed on the server side or somehow JConnect's isClosed impl is returning true when the connection is not closed (though it is working for me against SQL Anywhere). One more thing to try is setting IS_CLOSED_TEST=INTERNAL.

Marcos Sanz added a comment - 10/Apr/07 09:52 AM
The issue can be reproduced with pool 1.2/dbcp 1.2.1/collections 3.2. It can also be reproduced with another driver different of jConnect.

I've isolated the source of the problem. It is the debugging line

System.out.println(connectionPool.getMaxActive());

which I had in my original code (see new attachment).

It looks as if the synchronized in GenericObjectPool.getMaxActive() method would be originating the physical opens/closes, but I still don't know why.


Marcos Sanz added a comment - 10/Apr/07 09:56 AM
New file with additional debugging that actually causes the bug.

Marcos Sanz added a comment - 10/Apr/07 04:08 PM
Indeed, the problem is triggered by the synchronized methods in GenericObjectPool, like getNumActive() and getNumIdle(), often called by applications. These methods serialize all executing threads to a single code point, thus the number of unused objects in the pool increases at some moments and eventually reaches the maxIdle value, which triggers the destroyObject(), causing physical closes() of database connections.

This is very cumbersome, and nearly a paradox: there are indeed one thousand threads in the example concurring for fifteen database connections, yet the pool decides the connections are idle.

One solution to this problem can be the application always setting maxIdle := maxActive, though this has side effects. I am wondering whether it could be possible to make these private fields (_maxActive, _numActive, ...) volatile and get rid of the synchronized in the methods. I have to explore this. In the meantime, I think this issue belongs to Commons-Pool, rather than to Commons-DBCP.


Phil Steitz added a comment - 11/Apr/07 05:47 AM
Thanks. By adding the System.out, I can reproduce the symptoms. You are correct that it is the pool's returnObject method that is triggering closing of the excess idle physical connections. The default maxIdle for the pool is 8. I think that what is causing the threads to "stack up", however, is not contention for locks on the pool, but contention on the relatively slow and synchronized System.out.println(). When I replace System.out.println(connectionPool.getMaxActive()) with just connectionPool.getMaxActive(), the idle connection count to does not grow. On the other hand, replacing it with System.out.println("hello") leads to the idle object build up.

Marcos Sanz added a comment - 11/Apr/07 10:57 AM
Not true. Even if I drop the System.out call and maintain the getMaxActive(), the situation persists. I still can observe the physical closes, you just have to wait a bit longer (in the order of minutes for my system, but YMMV) for the situation to appear. And if you wait a bit longer, exceptions start occurring (see attachment).

The concurrency control concept and, as a result, the liveness of the GenericObjectPool is catastrophical. And I am not talking only about the synchronized counter accesses. How comes that expensive (database) operations like makeObject() are locking the instance of the pool when this is not needed at all?


Phil Steitz added a comment - 12/Apr/07 04:45 AM
Thanks, Marcos. Really appreciate your feedback and help chasing this down. I should have thought a little more about what is going on here before responding. What I now think is happening is that either as a result of contention for locks on the pool (via getMaxActive) or buffer flushing in System.out, the active threads are periodically closing connections faster than they are (collectively) opening them. This makes the number of idle connections periodically exceed maxIdle, they get closed and then load increases again, so new ones get created. This could happen in production for lots of different reasons.

I am leaving this open as a dbcp bug, since until pool is improved somehow to be more intelligent about destroying idle objects, dbcp is prone to this behavior. The workaround in situations where load and open/close rates can flucutate is to set maxIdle = maxActive, or at least to avoid maxIdle << maxActive.

The documentation of both pool and dbcp should be improved to call out the fact that the default value of maxIdle is 8 and what this setting really means.

I am also marking this as related to DBCP-205 (maybe a duplicate?). The same Windows socket error was reported there, in that case also associated with connection churn, and the resolution of that issue requires the same kind of enhancement to pool.


Marcos Sanz added a comment - 13/Apr/07 01:26 PM
Yes, I also have the feeling that DBCP-205 is an aspect of this issue, appearing because of exhaustion of ephimeral ports on the client side due to the masive number of operations on the physical connections.

Marcos Sanz added a comment - 23/Apr/07 02:04 PM
Regarding concurrency control of GenericObjectPool, I have submitted a patch (see issue POOL-93) that improves the borrow/return times of the pool. DBCP-212 is somehow hereby alleviated, but it isn't still fixed.

It was not worth tweaking GenericObjectPool.borrowObject() any further because at the moment the next bottleneck of the test program is DBCP's method PoolableConnectionFactory.makeObject() (which is called from within borrowObject()): When the pool reaches a certain size and needs to be increased, all threads get stack at the factory entrance. Is there any compelling reason for that method to be synchronized?


Phil Steitz added a comment - 20/Jun/07 07:46 PM
PoolableConnectionFactory.makeObject() has been synchronized since the initial commit. Synchronization of validateObject was removed in r132110 in response to BZ 25096. Can anyone else see a reason that this synchronization is necessary?

Dain Sundstrom added a comment - 23/Jul/07 07:04 PM
Fixed some time ago.

Marcos Sanz added a comment - 23/Jul/07 09:12 PM
Not really fixed. As Phil wrote: "until pool is improved somehow to be more intelligent about destroying idle objects, dbcp is prone to this behavior".
I'd like to keep this open so as not to forget this. The question remains: Has PoolableConnectionFactory.makeObject() necessarily be synchronized?

Mark Thomas added a comment - 22/Feb/09 07:39 PM
I can't see a good reason for the sync on makeObject() so I removed it. It does seem to do a little to alleviate the problem but the issue is still significant.

Mark Thomas added a comment - 22/Feb/09 08:10 PM
I have to add the sync back in - without it, driver registration was broken. Once I get to the bottom of why, I'll take the sync back out if I can.

Mark Thomas added a comment - 22/Feb/09 10:44 PM
I'm changing the status of this to enhancement as the current code is working as designed, even if it could be smarter.

If GenericObjectPool.addObjectToPool took account of the number of waiting threads then this issue would be easy to solve. That number isn't currently available but if POOL-75 is implemented then the number of waiting threads could be exposed.


Mark Thomas added a comment - 21/Jun/09 05:28 PM
The numerous sync changes made in POOL 1.5.x should eliminate the remaining issues reported here. In particular:
  • no factory method is called from within a sync block
  • borrowObject() uses syncs only where necessary
  • the implementation of fair queuing ensures that an object will be allocated to a waiting thread when it is returned

DBCP 1.3 will include POOL 1.5.1. In the mean time, you should still see most of the benefit by switching to POOL 1.5.1 or later