Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
3.2.2
-
None
Description
Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool 1.5.4. These are ancient versions, I know.
Description
The issue is that when due to some network issues or "maintenance work" on the DB side (especially PostgreSQL) which causes the DB connection to be closed, it results exhausted Pool on the client side. Many threads are waiting at this point:
"pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x00007faf7903b800 nid=0x8603 waiting on condition [0x000000030f3e7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000066aca8e70> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324)
According to my observation this is because the JDBC driver does not get closed on the client side, nor the abstract DBCP connection org.apache.commons.dbcp2.PoolableConnection .
Repro
(Un)Fortunately I can reproduce the issue using the latest and greatest commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2.
I've just created a Java application to reproduce the issue: https://github.com/dionusos/pool_exhausted_repro . See README.md for detailed repro steps.
What we tried so far
I got in touch with DBCP team who confirmed that in case of an error in the connection the client (in this case OpenJPA is the client of DBCP) should handle the exception like closing the connection: DBCP-595. I agree with them as based on the investigation I did I can also confirm that DBCP is really robust when the client releases the broken connection object after catching SQLException. Please check the 4 comments on DBCP-595 for extra details.
Ask
OpenJPA team!
- Could you please confirm that my findings are valid?
- Did I do anything wrong in my repro program?
- Oozie has retry logic implemented: https://github.com/apache/oozie/blob/318fac5/core/src/main/java/org/apache/oozie/service/JPAService.java#L397L427 but this cannot avoid the reported dead lock.
- Do you have any questions I can answer to help in the investigation?
Attachments
Attachments
Issue Links
- relates to
-
DBCP-595 Connection pool can be exhausted when connections are killed on the DB side
- Closed