Issue Details (XML | Word | Printable)

Key: POOL-149
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Unassigned
Reporter: shuyang.zhou
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Commons Pool

A serious concurrent bug can cause resource leak when Pool exhausted and borrowed objects are invalid

Created: 20/Aug/09 04:32 PM   Updated: 24/Sep/09 12:49 AM
Return to search
Component/s: None
Affects Version/s: 1.5, 1.5.1, 1.5.2
Fix Version/s: 1.5.3

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File BugTest.java 2009-08-24 05:42 AM shuyang.zhou 3 kB

Resolution Date: 09/Sep/09 01:12 PM


 Description  « Hide
This bug will happen when the pool is in exhausted state and the borrowed object are invalid.

Let's go through a simple scenario:
1)A GenericObjectPool with _maxActive==1, whenExhaustedAction==WHEN_EXHAUSTED_BLOCK
2)Two threads using that pool, called thread1 and thread2

Here is error path:
1)thread1 calls pool.borrowObject() to get the object out of the pool
//now the pool is exhausted
2)thread2 calls pool.borrowObject(), adds a new latch to the _allocationQueue, but before it enters the synchronized block for WHEN_EXHAUSTED_BLOCK(GenericObjectPool line 1099 revision 806215), context-switch happens
3)thread1 checks the object from pool, and decides to invalidate it. So it calls pool.invalidateObject(), which calls allocate(), then calls latch.notify(), but currently no thread is waiting on this latch.(thread2 has not enter the wait synchronized block yet).
4)Then thread2 will wait there for ever.(it just missed the notify)



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Phil Steitz added a comment - 24/Aug/09 03:43 AM
This looks like a "latch leak" similar to POOL-147. It would be great to have a test case to demonstrate the reported issue. In any case, the execution sequence should not result in a deadlock; rather a resource leak and one thread blocking indefinitely.

shuyang.zhou added a comment - 24/Aug/09 05:17 AM
HI Phil Steitz,

My problem is different with POOL-147.
POOL-147 is a "double check" problem, i think.
But my problem is the "notify" comes earlier than its "wait", which will cause the "wait" wait for ever.

I will create a demo test case for this.


shuyang.zhou added a comment - 24/Aug/09 05:42 AM
Ok, there is no way to force the bad case to happen in a unit test.
I created a test case, and with it you can just hope the bug happens.

And i tested it several times, it always happens. But maybe you need to run it several time to get your luck.

Here is the instruction:
1)Run the test case, and look at your output console
2)Wait until it stops output timestamp(If you test case quit successfully, you need to restart from 1))
3)Use VisualVM to take a thread dump and heap dump

From the thread dump you should able to find one pooled thread wait on GenericObjectPool for ever like this:
"pool-267-thread-10" prio=10 tid=0x00007f58c4300800 nid=0x2892 in Object.wait() [0x00007f58c91a2000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1103)

  • locked <0x00007f58fb042db0> (a org.apache.commons.pool.impl.GenericObjectPool$Latch)
    at BugTest$TestJob.run(BugTest.java:72)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

Locked ownable synchronizers:

  • <0x00007f58fa67ff10> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

From the heap dump you should able to see the GenericObjectPool is empty.

There is another way to see this bug easily, but you need a multithread enabled debuger, for example, Netbeans(See http://www.netbeans.org/kb/docs/java/debug-multithreaded.html)
In the test case set TEST_THREAD_NUM=2, MAX_ACTIVE=1, run the debuger with the instruction in ticket description, you can see how this bug happens easily.


shuyang.zhou added a comment - 24/Aug/09 05:50 AM
And this bug should be called resource leak, not deadlock. My bad.

Finally all threads try to borrow objects from the empty pool will block indefinitely.


Phil Steitz added a comment - 25/Aug/09 02:02 PM
Thanks, Shuyang!

The test case demonstrates the problem nicely. A simple fix for the specific case identified here is to recheck latch.mayCreate() inside the synch(latch) block, i.e. change GenericObjectPool line 1101 revision 792217 to

if (latch.getPair() == null && !latch.mayCreate())

That "fix" does not fully address the problem though, as it deals only with the control path called out in the bug report. We should look at all other latch state tests and scope of synchronization on latches.


shuyang.zhou added a comment - 25/Aug/09 02:52 PM
Just curious, do you guys have a plan to upgrade Commons Pool's jdk requirement to 1.5+ ?

By the concurrent package, you can avoid the fragile synchronized, wait and notify, and also use the spin lock instead of the old monitor lock to get a performance gain.


Phil Steitz added a comment - 25/Aug/09 02:57 PM
Yes, in pool 2.0.

shuyang.zhou added a comment - 25/Aug/09 03:07 PM
Great! Hope it comes out soon

Filippo Diotalevi added a comment - 01/Sep/09 10:44 AM
Hi, I believe this problem is hitting me in pre-production, where all the threads are dealocked at

java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)

  • waiting on <0x000000010a2a1258> (a org.apache.commons.pool.impl.GenericObjectPool)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:942)
  • locked <0x000000010a2a1258> (a org.apache.commons.pool.impl.GenericObjectPool)
    at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:96)
    at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:880)
    at sun.reflect.GeneratedMethodAccessor87.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
    at org.springframework.osgi.service.importer.support.internal.aop.ServiceInvoker.doInvoke(ServiceInvoker.java:58)
    at org.springframework.osgi.service.importer.support.internal.aop.ServiceInvoker.invoke(S

Can I have an update on this issue? Is there a patch I can apply to workaround this problem?


shuyang.zhou added a comment - 01/Sep/09 11:04 AM
Hi Filippo,

No sure is your problem the same as the one i found. Because your thread dump looks different. Do you know which version of commons pool you have? GenericObjectPool.java:942 is javadoc in 1.5.2

If you sure about they are the same, you can try Phil's solution. Even though there maybe more bugs like this, but this fix should help your problem.


Phil Steitz added a comment - 01/Sep/09 07:24 PM
Fillippo: From the line numbers in the stack trace above, it looks like a) the pool version is 1.4 and b) you have configured the pool to block indefinitely (maxWait < 0) when the pool is exhausted and c) the pool is exhausted. The issue reported in this ticket applies only to pool 1.5+.

Assuming I have interpreted the line numbers in the trace right, most likely what is going on is your application is not closing connections and you are exhausting the pool.


Mark Thomas added a comment - 09/Sep/09 01:12 PM
This is related to POOL-147 although it highlights a wider problem.

In short, when testing to see if allocate() had assigned an object, checking latch.getPair()==null was insufficient. It is also necessary to check latch.mayCreate()==false

This has been patched in trunk and will be fixed in 1.5.3

The attached test case now runs without locking up.