Commons Dbcp
  1. Commons Dbcp
  2. DBCP-44

[dbcp] Evictor thread in GenericObjectPool has potential for deadlock

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 1.2.1
    • Fix Version/s: 1.3
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

      Description

      The GenericObjectPool Evictor thread can potentially cause a deadlock between
      its connection factory and java.sql.DriverManager. The deadlock occurs when the
      Evictor thread is trying to make enough connections to bring the pool's idle
      connections up to what's specified in minIdle, at the same time a connection is
      being requested through DriverManager.getConnection(). See the attached stack
      trace dump:

      Found one Java-level deadlock:
      =============================
      "Thread-0":
      waiting to lock monitor 0x0809a994 (object 0x698c2b48, a java.lang.Class),
      which is held by "main"
      "main":
      waiting to lock monitor 0x0809aad4 (object 0x65df7030, a
      org.apache.commons.dbcp.PoolableConnectionFactory),
      which is held by "Thread-0"

      Java stack information for the threads listed above:
      ===================================================
      "Thread-0":
      at java.sql.DriverManager.getConnection(DriverManager.java:158)

      • waiting to lock <0x698c2b48> (a java.lang.Class)
        at
        org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:48)
        at
        org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:290)
      • locked <0x65df7030> (a org.apache.commons.dbcp.PoolableConnectionFactory)
        at
        org.apache.commons.pool.impl.GenericObjectPool.addObject(GenericObjectPool.java:996)
        at
        org.apache.commons.pool.impl.GenericObjectPool.ensureMinIdle(GenericObjectPool.java:978)
        at
        org.apache.commons.pool.impl.GenericObjectPool.access$000(GenericObjectPool.java:124)
        at
        org.apache.commons.pool.impl.GenericObjectPool$Evictor.run(GenericObjectPool.java:1090)
        at java.lang.Thread.run(Thread.java:595)
        "main":
        at
        org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:290)
      • waiting to lock <0x65df7030> (a
        org.apache.commons.dbcp.PoolableConnectionFactory)
        at
        org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:771)
        at org.apache.commons.dbcp.PoolingDriver.connect(PoolingDriver.java:175)
        at java.sql.DriverManager.getConnection(DriverManager.java:525)
      • locked <0x698c2b48> (a java.lang.Class)
        at java.sql.DriverManager.getConnection(DriverManager.java:193)
      • locked <0x698c2b48> (a java.lang.Class)
        at Deadlock.main(Deadlock.java:83)

      Found 1 deadlock.

      The deadlock occurs when GenericObjectPool.addObject() calls synchronized method
      PoolableConnectionFactory.makeObject(), meanwhile static synchronized
      DriverManager.getConnection() is called. makeObject() waits for the lock on
      DriverManager to be released, whereas DriverManager waits for the lock on the
      PoolableConnectionFactory instance to be released.

      The Java program below, based on ManualPoolingDriverExample.java provided with
      DBCP, duplicates the deadlock for me within several seconds of being run:

      import java.sql.*;
      import org.apache.commons.pool.*;
      import org.apache.commons.pool.impl.*;
      import org.apache.commons.dbcp.*;

      /**

      • Duplicate DBCP pool deadlocking.
        *
      • Compile with:
      • /usr/java/jdk1.5.0/bin/javac
      • -classpath commons-collections.jar:commons-dbcp-1.2.1.jar:commons-pool-1.2.jar
      • Deadlock.java
        *
      • Run with:
      • /usr/java/jdk1.5.0/bin/java
      • -classpath
        commons-collections.jar:commons-dbcp-1.2.1.jar:commons-pool-1.2.jar:ojdbc14.jar:xerces.jar:.
      • Deadlock
        *
      • Locks still occur when compiled and run with J2SDK 1.4.1_03.
        */
        public class Deadlock {

      private static final int ACTIVE = 10;

      private static void init() {
      System.out.println("Loading drivers");

      try

      { Class.forName("oracle.jdbc.driver.OracleDriver"); Class.forName("org.apache.commons.dbcp.PoolingDriver"); }

      catch (ClassNotFoundException e)

      { e.printStackTrace(); }

      System.out.println("Setting up pool");

      try { GenericObjectPool.Config config = new GenericObjectPool.Config(); config.maxActive = ACTIVE; config.minIdle = 2; // Idle limits are low to allow more possibility of locking. config.maxIdle = 4; // Locking only occurs when there are 0 idle connections in the pool. config.maxWait = 5000L; config.testOnBorrow = true; config.testOnReturn = false; config.testWhileIdle = true; // Locking still occurs whether these tests are performed or not. config.whenExhaustedAction = GenericObjectPool.WHEN_EXHAUSTED_BLOCK; // Locking still occurs regardless of the exhausted action. config.timeBetweenEvictionRunsMillis = 3000L; // The Evictor thread is involved in the lock, so run it quite often. config.minEvictableIdleTimeMillis = 6000L; config.numTestsPerEvictionRun = 3; ObjectPool op = new GenericObjectPool(null, config); ConnectionFactory cf = new DriverManagerConnectionFactory("jdbc:oracle:thin:@oracle8server:1521:sid", "username", "password"); PoolableConnectionFactory pcf = new PoolableConnectionFactory(cf, op, null, "SELECT 1 FROM DUAL", false, true); // Locking still occurs whether there is a validation query or not. PoolingDriver pd = (PoolingDriver)DriverManager.getDriver("jdbc:apache:commons:dbcp:"); pd.registerPool("PoolName", op); } catch (SQLException e) { e.printStackTrace(); }

      System.out.println("Done");
      }

      public static void main(String[] args) {
      init();

      Connection[] c = new Connection[ACTIVE];

      try {
      printPoolStatus();

      // Loop forever to create a high load.
      while (true) {
      // Create a number of connections.
      for (int i = 0; i < ACTIVE; ++i)

      { c[i] = DriverManager.getConnection("jdbc:apache:commons:dbcp:PoolName"); printPoolStatus(); }

      // Then immmediately drop them.
      for (int i = 0; i < ACTIVE; ++i) {
      try {
      if (c[i] != null)

      { c[i].close(); printPoolStatus(); c[i] = null; }

      } catch (SQLException e)

      { e.printStackTrace(); }

      }
      }

      } catch (SQLException e)

      { e.printStackTrace(); }

      finally {
      // Close down any open connetions.
      for (int i = 0; i < ACTIVE; ++i) {
      try

      { if (c[i] != null) c[i].close(); }

      catch (SQLException e) { }
      }

      System.out.println("Closing pool");
      try

      { PoolingDriver pd = (PoolingDriver)DriverManager.getDriver("jdbc:apache:commons:dbcp:"); pd.closePool("PoolName"); }

      catch (SQLException e)

      { e.printStackTrace(); }

      System.out.println("Pool closed");
      }

      }

      /**

      • Display pool status. Locks still occur even if this method is never
        called.
        */
        private static void printPoolStatus() throws SQLException { PoolingDriver pd = (PoolingDriver)DriverManager.getDriver("jdbc:apache:commons:dbcp:"); ObjectPool op = pd.getConnectionPool("PoolName"); System.out.println("Active / idle: " + op.getNumActive() + " / " + op.getNumIdle()); }

      }

      The patch I initially suggest is as follows (sorry for not providing a diff):

      In org.apache.commons.dbcp.PoolableConnectionFactory.java we have:

      synchronized public Object makeObject() throws Exception {
      Connection conn = _connFactory.createConnection();
      if(null != _stmtPoolFactory)

      { KeyedObjectPool stmtpool = _stmtPoolFactory.createPool(); conn = new PoolingConnection(conn,stmtpool); stmtpool.setFactory((PoolingConnection)conn); }

      return new PoolableConnection(conn,_pool,_config);
      }

      I suggest changing that to this:
      public Object makeObject() throws Exception {
      Connection conn = _connFactory.createConnection();
      synchronized (this) {
      if(null != _stmtPoolFactory)

      { KeyedObjectPool stmtpool = _stmtPoolFactory.createPool(); conn = new PoolingConnection(conn,stmtpool); stmtpool.setFactory((PoolingConnection)conn); }

      return new PoolableConnection(conn,_pool,_config);
      }
      }

      Note the move of the synchronized block from the entire method to within the
      method after the connection (obtained ultimately by DriverManager) is retrieved.

      I'm afraid I don't know enough about DBCP and pooling to know the full
      ramification of this change and other calls to makeObject().

      1. testConcurrency.java
        2 kB
        Phil Steitz
      2. deadlock.txt
        8 kB
        Phil Steitz
      3. deadlock_post_patch.txt
        13 kB
        Phil Steitz
      4. DBCP-44-possible-solution.patch.txt
        3 kB
        Mark Thomas
      5. DBCP-44.patch
        12 kB
        Dain Sundstrom

        Issue Links

          Activity

          Eric Layton created issue -
          Hide
          David Marshall added a comment -

          I've encountered the same deadlock and have a couple of additional comments: the
          bug is only when using PoolingDriver and a setting of minIdle > 0 in the
          GenericObjectPool configuration.

          Suggested interim workarounds:
          a) set minIdle = 0 in your configuration for the GenericObjectPool
          b) if you know you are connection pooling within the application code, retrieve
          a connection as follows:
          ObjectPool connectionPool = poolingDriver.getConnectionPool(name);
          Connection conn = (Connection)connectionPool.borrowObject();
          This second method will allow you to have minIdle > 0 while avoiding the
          deadlock code path between threads.

          Show
          David Marshall added a comment - I've encountered the same deadlock and have a couple of additional comments: the bug is only when using PoolingDriver and a setting of minIdle > 0 in the GenericObjectPool configuration. Suggested interim workarounds: a) set minIdle = 0 in your configuration for the GenericObjectPool b) if you know you are connection pooling within the application code, retrieve a connection as follows: ObjectPool connectionPool = poolingDriver.getConnectionPool(name); Connection conn = (Connection)connectionPool.borrowObject(); This second method will allow you to have minIdle > 0 while avoiding the deadlock code path between threads.
          Henri Yandell made changes -
          Field Original Value New Value
          issue.field.bugzillaimportkey 33912 12342100
          Henri Yandell made changes -
          Key COM-1948 DBCP-44
          Assignee Jakarta Commons Developers Mailing List [ commons-dev@jakarta.apache.org ]
          Affects Version/s unspecified [ 12311647 ]
          Component/s Dbcp [ 12311109 ]
          Project Commons [ 12310458 ] Commons Dbcp [ 12310469 ]
          Phil Steitz made changes -
          Fix Version/s 1.2.2 [ 12311976 ]
          Bugzilla Id 33912
          Hide
          Phil Steitz added a comment -

          Deadlock after suggested patch, generated by testConcurrency unit test

          Show
          Phil Steitz added a comment - Deadlock after suggested patch, generated by testConcurrency unit test
          Phil Steitz made changes -
          Attachment deadlock_post_patch.txt [ 12345630 ]
          Hide
          Phil Steitz added a comment -

          Deadlock generated by testConcurrency

          Show
          Phil Steitz added a comment - Deadlock generated by testConcurrency
          Phil Steitz made changes -
          Attachment deadlock.txt [ 12345631 ]
          Hide
          Phil Steitz added a comment -

          Unit test generating deadlock using dbcp testDriver

          Show
          Phil Steitz added a comment - Unit test generating deadlock using dbcp testDriver
          Phil Steitz made changes -
          Attachment testConcurrency.java [ 12345632 ]
          Hide
          Phil Steitz added a comment -

          Like DBCP-65, I don't think there is a backward compatible way to fix this bug. The workaround suggested above (access the pool directly) is not pretty, but should avoid the deadlock.
          I have attached a test case that will generate the deadlock using the dbcp test drivers. The loop is bounded, so deadlock may not always happen. Use ctrl-break (I think) under Windows or ctrl \ on unix command line to generate thread dump once the test case hangs.
          I am moving this out to 1.3 fix version, at which time we will have to change either a) the pool impl or b) the way we dbcp uses the pool (see http://issues.apache.org/jira/browse/DBCP-65#action_12451390).

          Show
          Phil Steitz added a comment - Like DBCP-65 , I don't think there is a backward compatible way to fix this bug. The workaround suggested above (access the pool directly) is not pretty, but should avoid the deadlock. I have attached a test case that will generate the deadlock using the dbcp test drivers. The loop is bounded, so deadlock may not always happen. Use ctrl-break (I think) under Windows or ctrl \ on unix command line to generate thread dump once the test case hangs. I am moving this out to 1.3 fix version, at which time we will have to change either a) the pool impl or b) the way we dbcp uses the pool (see http://issues.apache.org/jira/browse/DBCP-65#action_12451390 ).
          Phil Steitz made changes -
          Fix Version/s 1.2.2 [ 12311976 ]
          Bugzilla Id 33912
          Affects Version/s 1.2.1 [ 12311975 ]
          Fix Version/s 1.3 [ 12311977 ]
          Phil Steitz made changes -
          Link This issue is blocked by DBCP-202 [ DBCP-202 ]
          Hide
          Phil Steitz added a comment -

          After looking at the deadlock traces and analyzing the code some more, I think that the root of the problem in this case is contention for the DriverManager. The DriverManager methods in the traces are static synchronized. What is going on is that direct client calls to DriverManager.getConnection immediately acquire the static lock on this class and then use PoolingDriver, which needs to lock the pool. The pool Timer locks the pool and tries to use the DriverManager to access the the jdbc driver to create a new concrete connection, but can't acquire the static lock on DriverManager. The order of locks here is beyond control of dbcp. So, unless and until the Timer mechanism of pool is reworked somehow, I don't think this can be fixed.

          The workaround is to either disable the pool Timer or avoid using the DriverManager as a client with PoolingDriver.

          Show
          Phil Steitz added a comment - After looking at the deadlock traces and analyzing the code some more, I think that the root of the problem in this case is contention for the DriverManager. The DriverManager methods in the traces are static synchronized. What is going on is that direct client calls to DriverManager.getConnection immediately acquire the static lock on this class and then use PoolingDriver, which needs to lock the pool. The pool Timer locks the pool and tries to use the DriverManager to access the the jdbc driver to create a new concrete connection, but can't acquire the static lock on DriverManager. The order of locks here is beyond control of dbcp. So, unless and until the Timer mechanism of pool is reworked somehow, I don't think this can be fixed. The workaround is to either disable the pool Timer or avoid using the DriverManager as a client with PoolingDriver.
          Hide
          Henri Yandell added a comment -

          Should we make a Pool issue for this?

          Show
          Henri Yandell added a comment - Should we make a Pool issue for this?
          Hide
          Dain Sundstrom added a comment -

          This is an architectural problem in commons-pool. GenericObjectPool.borrowObject() is synchronized so when it needs to make a new object, that method is called within a synchronized block. Here are the stack traces of the deal lock in our case:

          main@1, priority=5, in group 'main', status: 'MONITOR'
          blocks Timer-0@fc
          waiting for Timer-0@fc
          borrowObject():781, GenericObjectPool.java
          connect():176, PoolingDriver.java
          getConnection():525, DriverManager.java
          getConnection():140, DriverManager.java
          main():107, Deadlock.java

          Timer-0@fc daemon, priority=5, in group 'main', status: 'MONITOR'
          blocks main@1
          waiting for main@1
          getConnection():138, DriverManager.java
          createConnection():68, DriverManagerConnectionFactory.java
          makeObject():329, PoolableConnectionFactory.java
          addObject():1059, GenericObjectPool.java
          ensureMinIdle():1040, GenericObjectPool.java
          access$100():128, GenericObjectPool.java
          run():1117, GenericObjectPool.java
          mainLoop():512, Timer.java
          run():462, Timer.java

          This is a classic two lock dead lock. The main thread acquires the locks on DriveManager then GenericObjectPool, and at the same time the timer thread attempts to acquire them in the opposite order GenericObjectPool and then DriveManger.

          There are really only two ways to fix this 1) guarantee locks are always acquired in the same order or 2) eliminate one of the locks. The first one is not really possible as both objects are generally publicly accessible, and the second fix will require a complex change to the GenericObjectPool code.

          Attached is a patch that adds the Deadlock class, which has a main method that causes the dead lock. The patch also fixes the synchronization in PoolableConnectionFactory. I also added property to TesterDriver to cause the thread to sleep in getConnection so the dead lock can be reproduced.

          I'm going to attempt to write a new pool using the Java5 concurrent code which doesn't call makeObject inside of a synchronized block.

          Show
          Dain Sundstrom added a comment - This is an architectural problem in commons-pool. GenericObjectPool.borrowObject() is synchronized so when it needs to make a new object, that method is called within a synchronized block. Here are the stack traces of the deal lock in our case: main@1, priority=5, in group 'main', status: 'MONITOR' blocks Timer-0@fc waiting for Timer-0@fc borrowObject():781, GenericObjectPool.java connect():176, PoolingDriver.java getConnection():525, DriverManager.java getConnection():140, DriverManager.java main():107, Deadlock.java Timer-0@fc daemon, priority=5, in group 'main', status: 'MONITOR' blocks main@1 waiting for main@1 getConnection():138, DriverManager.java createConnection():68, DriverManagerConnectionFactory.java makeObject():329, PoolableConnectionFactory.java addObject():1059, GenericObjectPool.java ensureMinIdle():1040, GenericObjectPool.java access$100():128, GenericObjectPool.java run():1117, GenericObjectPool.java mainLoop():512, Timer.java run():462, Timer.java This is a classic two lock dead lock. The main thread acquires the locks on DriveManager then GenericObjectPool, and at the same time the timer thread attempts to acquire them in the opposite order GenericObjectPool and then DriveManger. There are really only two ways to fix this 1) guarantee locks are always acquired in the same order or 2) eliminate one of the locks. The first one is not really possible as both objects are generally publicly accessible, and the second fix will require a complex change to the GenericObjectPool code. Attached is a patch that adds the Deadlock class, which has a main method that causes the dead lock. The patch also fixes the synchronization in PoolableConnectionFactory. I also added property to TesterDriver to cause the thread to sleep in getConnection so the dead lock can be reproduced. I'm going to attempt to write a new pool using the Java5 concurrent code which doesn't call makeObject inside of a synchronized block.
          Dain Sundstrom made changes -
          Attachment DBCP-44.patch [ 12362270 ]
          Hide
          Holger Hoffstätte added a comment -

          Hi Dain, I agree that the evictor thread is problematic (it should go away and/or be replaced by a custom evictor thingy that gets scheduled externally), but it is not clear to me how you could make makeObject() non-synchronized. If you want to go for the usual create-and-CAS pattern you risk a potential double allocation, and pool is definitely used for situations where this would be very bad.
          A multi-staged locking hierarchy (implemented with the new ReentrantReadWrite locks) could help us to the effect that you outlined wrt. the lock ordering. Entering readers (pool consumers) all would have to pass the global writer lock entrance; once someone is inside, the periodic evictor has to enqueue at the fron t door until he gets the write lock, and can proceed.
          Completely concurrent (lock-free) eviction would be much harder.
          A mutli-staged locking hierarchy would also enable us to properly decouple some of the other paths inside borrow/returnObject. For returnObject I have already submitted such a simple patch.

          Show
          Holger Hoffstätte added a comment - Hi Dain, I agree that the evictor thread is problematic (it should go away and/or be replaced by a custom evictor thingy that gets scheduled externally), but it is not clear to me how you could make makeObject() non-synchronized. If you want to go for the usual create-and-CAS pattern you risk a potential double allocation, and pool is definitely used for situations where this would be very bad. A multi-staged locking hierarchy (implemented with the new ReentrantReadWrite locks) could help us to the effect that you outlined wrt. the lock ordering. Entering readers (pool consumers) all would have to pass the global writer lock entrance; once someone is inside, the periodic evictor has to enqueue at the fron t door until he gets the write lock, and can proceed. Completely concurrent (lock-free) eviction would be much harder. A mutli-staged locking hierarchy would also enable us to properly decouple some of the other paths inside borrow/returnObject. For returnObject I have already submitted such a simple patch.
          Hide
          Dain Sundstrom added a comment -

          This will take experimentation, but off the top of my head, I'd do somthing like this:

          while(true) {
          synchronized(idleCollection) {
          if (!idleCollection.isEmpty())

          { return idleCollection.removeFirst(); }

          }

          Future future = this.allocationFutureAtomic.get();
          if (allocationFuture == null) {
          future = new AllocationFuture();
          if (this.allocationFutureAtomic.weakCompareAndSet(null, future))

          { future.get(); }

          }
          }

          The AllocationFuture would create new connection(s), add them to the idle pool, and then clear the future atomic. Of course this can be done with primitives, but it is more painful to code. It would be cleaner to use an executor to process the future, so it can potentially be done on another thread which isn't holding locks, and if the makeObject() hangs you, don't have to interrupt an application thread. All of this becomes more difficult when you try to implement a keyed pool due to double book keeping of a global count and per key count.

          Anyway, I'll experiment with some implementations and if I come up with anything good, I'll post it.

          Show
          Dain Sundstrom added a comment - This will take experimentation, but off the top of my head, I'd do somthing like this: while(true) { synchronized(idleCollection) { if (!idleCollection.isEmpty()) { return idleCollection.removeFirst(); } } Future future = this.allocationFutureAtomic.get(); if (allocationFuture == null) { future = new AllocationFuture(); if (this.allocationFutureAtomic.weakCompareAndSet(null, future)) { future.get(); } } } The AllocationFuture would create new connection(s), add them to the idle pool, and then clear the future atomic. Of course this can be done with primitives, but it is more painful to code. It would be cleaner to use an executor to process the future, so it can potentially be done on another thread which isn't holding locks, and if the makeObject() hangs you, don't have to interrupt an application thread. All of this becomes more difficult when you try to implement a keyed pool due to double book keeping of a global count and per key count. Anyway, I'll experiment with some implementations and if I come up with anything good, I'll post it.
          Hide
          Dain Sundstrom added a comment -

          BTW it isn't the evictor thread that is problematic per say (it could have other issues ). It is the fact that when generic object pool calls makeObject it is holding a lock on the whole pool. This type of lock holding open out call is bound to cause dead locks. The example in this issue is a dead lock between a thread using a DBCP pool to get a connection and a thread that goes directly to the DriveManager for a connection.

          Show
          Dain Sundstrom added a comment - BTW it isn't the evictor thread that is problematic per say (it could have other issues ). It is the fact that when generic object pool calls makeObject it is holding a lock on the whole pool. This type of lock holding open out call is bound to cause dead locks. The example in this issue is a dead lock between a thread using a DBCP pool to get a connection and a thread that goes directly to the DriveManager for a connection.
          Hide
          Dain Sundstrom added a comment -

          Ok one more comment before I get lunch.... My sample code above is bad. It is a hard loop (duh). Instead of hard looping, the second check on the idleCollection could wait for a item to become available (easy with a concurrent queue). Which leave the possibility that someone sneaks in and steals the new connection we were waiting for As, I said it will take some experimentation to get this right, but that is the fun part.

          Show
          Dain Sundstrom added a comment - Ok one more comment before I get lunch.... My sample code above is bad. It is a hard loop (duh). Instead of hard looping, the second check on the idleCollection could wait for a item to become available (easy with a concurrent queue). Which leave the possibility that someone sneaks in and steals the new connection we were waiting for As, I said it will take some experimentation to get this right, but that is the fun part.
          Hide
          Vinod K added a comment -

          We've seen a deaklock in our env, when thread 1 holds lock A then wants lock B, thread 2 holds lock B then wants lock A. As other threads tried to obtain lock B they started to hang, until almost all threads were hung. Is this the same issue.

          2LKDEADLOCKTHR Thread "Thread-5351" (0x7E9B46F0)
          3LKDEADLOCKWTR is waiting for:
          4LKDEADLOCKMON sys_mon_t:0x833AEF38 infl_mon_t: 0x00000000:
          4LKDEADLOCKOBJ org.apache.commons.pool.impl.GenericObjectPool@3B974CF0/3B974CF8:
          3LKDEADLOCKOWN which is owned by:
          2LKDEADLOCKTHR Thread "Thread-42" (0x7E563110)
          3LKDEADLOCKWTR which is waiting for:
          4LKDEADLOCKMON sys_mon_t:0x7C3C9168 infl_mon_t: 0x00000000:
          4LKDEADLOCKOBJ com.vignette.dbms.VgnConnectionPool$_PoolableConnectionFactory@3B826CB0/3B826CB8:
          3LKDEADLOCKOWN which is owned by:
          2LKDEADLOCKTHR Thread "Thread-5351" (0x7E9B46F0)
          NULL
          NULL
          2LKDEADLOCKTHR Thread "Thread-42" (0x7E563110)
          3LKDEADLOCKWTR is waiting for:
          4LKDEADLOCKMON sys_mon_t:0x7C3C9168 infl_mon_t: 0x00000000:
          4LKDEADLOCKOBJ com.vignette.dbms.VgnConnectionPool$_PoolableConnectionFactory@3B826CB0/3B826CB8:
          3LKDEADLOCKOWN which is owned by:
          2LKDEADLOCKTHR Thread "Thread-5351" (0x7E9B46F0)
          3LKDEADLOCKWTR which is waiting for:
          4LKDEADLOCKMON sys_mon_t:0x833AEF38 infl_mon_t: 0x00000000:
          4LKDEADLOCKOBJ org.apache.commons.pool.impl.GenericObjectPool@3B974CF0/3B974CF8:
          3LKDEADLOCKOWN which is owned by:
          2LKDEADLOCKTHR Thread "Thread-42" (0x7E563110)
          NULL

          Show
          Vinod K added a comment - We've seen a deaklock in our env, when thread 1 holds lock A then wants lock B, thread 2 holds lock B then wants lock A. As other threads tried to obtain lock B they started to hang, until almost all threads were hung. Is this the same issue. 2LKDEADLOCKTHR Thread "Thread-5351" (0x7E9B46F0) 3LKDEADLOCKWTR is waiting for: 4LKDEADLOCKMON sys_mon_t:0x833AEF38 infl_mon_t: 0x00000000: 4LKDEADLOCKOBJ org.apache.commons.pool.impl.GenericObjectPool@3B974CF0/3B974CF8: 3LKDEADLOCKOWN which is owned by: 2LKDEADLOCKTHR Thread "Thread-42" (0x7E563110) 3LKDEADLOCKWTR which is waiting for: 4LKDEADLOCKMON sys_mon_t:0x7C3C9168 infl_mon_t: 0x00000000: 4LKDEADLOCKOBJ com.vignette.dbms.VgnConnectionPool$_PoolableConnectionFactory@3B826CB0/3B826CB8: 3LKDEADLOCKOWN which is owned by: 2LKDEADLOCKTHR Thread "Thread-5351" (0x7E9B46F0) NULL NULL 2LKDEADLOCKTHR Thread "Thread-42" (0x7E563110) 3LKDEADLOCKWTR is waiting for: 4LKDEADLOCKMON sys_mon_t:0x7C3C9168 infl_mon_t: 0x00000000: 4LKDEADLOCKOBJ com.vignette.dbms.VgnConnectionPool$_PoolableConnectionFactory@3B826CB0/3B826CB8: 3LKDEADLOCKOWN which is owned by: 2LKDEADLOCKTHR Thread "Thread-5351" (0x7E9B46F0) 3LKDEADLOCKWTR which is waiting for: 4LKDEADLOCKMON sys_mon_t:0x833AEF38 infl_mon_t: 0x00000000: 4LKDEADLOCKOBJ org.apache.commons.pool.impl.GenericObjectPool@3B974CF0/3B974CF8: 3LKDEADLOCKOWN which is owned by: 2LKDEADLOCKTHR Thread "Thread-42" (0x7E563110) NULL
          Hide
          David Rees added a comment - - edited

          I've encountered what appears to be the same deadlock when returning a connection to the pool.

          commons-dbcp-1.2.2
          commons-pool-1.4

          Is there any work being done on this? This bug has been open for quite some time.

          Are there any workarounds to avoid this issue? Googling simply reveals other deadlocks with similar stack traces.

          Here is my stack trace:

          "Timer-0":
          at org.apache.commons.dbcp.AbandonedTrace.addTrace(AbandonedTrace.java:175)
          waiting to lock <0x0000002aa04d4be8> (a org.apache.commons.dbcp.PoolableConnection)
          at org.apache.commons.dbcp.AbandonedTrace.init(AbandonedTrace.java:92)
          at org.apache.commons.dbcp.AbandonedTrace.<init>(AbandonedTrace.java:82)
          at org.apache.commons.dbcp.DelegatingStatement.<init>(DelegatingStatement.java:61)
          at org.apache.commons.dbcp.DelegatingConnection.createStatement(DelegatingConnection.java:224)
          at org.apache.commons.dbcp.PoolableConnectionFactory.validateConnection(PoolableConnectionFactory.java:331)
          at org.apache.commons.dbcp.PoolableConnectionFactory.validateObject(PoolableConnectionFactory.java:312)
          at org.apache.commons.pool.impl.GenericObjectPool.evict(GenericObjectPool.java:1217)
          locked <0x0000002a9ee91bf8> (a org.apache.commons.pool.impl.GenericObjectPool)
          at org.apache.commons.pool.impl.GenericObjectPool$Evictor.run(GenericObjectPool.java:1341)
          at java.util.TimerThread.mainLoop(Timer.java:512)
          at java.util.TimerThread.run(Timer.java:462)
          "SocketRequest-8":
          at org.apache.commons.pool.impl.GenericObjectPool.addObjectToPool(GenericObjectPool.java:1137)
          waiting to lock <0x0000002a9ee91bf8> (a org.apache.commons.pool.impl.GenericObjectPool)
          at org.apache.commons.pool.impl.GenericObjectPool.returnObject(GenericObjectPool.java:1076)
          at org.apache.commons.dbcp.PoolableConnection.close(PoolableConnection.java:87)
          locked <0x0000002aa04d4be8> (a org.apache.commons.dbcp.PoolableConnection)
          at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.close(PoolingDataSource.java:181)
          at my.package.DbUtil.close(DbUtil.java:545)
          at my.package.DbUtil.close(DbUtil.java:587)
          at my.package.Request.closeConnection(NZTABRequest.java:970)
          at my.package.SocketRequest.processRequest(SocketRequest.java:57)
          at my.package.SocketRequest.processRequest(SocketRequest.java:110)
          at my.package.SocketRequest.waitForWork(SocketRequest.java:82)
          locked <0x0000002a9ed88ef0> (a my.package.SocketRequest)
          at my.package.SocketRequest.run(SocketRequest.java:71)

          (Sorry for all the edits!)

          Show
          David Rees added a comment - - edited I've encountered what appears to be the same deadlock when returning a connection to the pool. commons-dbcp-1.2.2 commons-pool-1.4 Is there any work being done on this? This bug has been open for quite some time. Are there any workarounds to avoid this issue? Googling simply reveals other deadlocks with similar stack traces. Here is my stack trace: "Timer-0": at org.apache.commons.dbcp.AbandonedTrace.addTrace(AbandonedTrace.java:175) waiting to lock <0x0000002aa04d4be8> (a org.apache.commons.dbcp.PoolableConnection) at org.apache.commons.dbcp.AbandonedTrace.init(AbandonedTrace.java:92) at org.apache.commons.dbcp.AbandonedTrace.<init>(AbandonedTrace.java:82) at org.apache.commons.dbcp.DelegatingStatement.<init>(DelegatingStatement.java:61) at org.apache.commons.dbcp.DelegatingConnection.createStatement(DelegatingConnection.java:224) at org.apache.commons.dbcp.PoolableConnectionFactory.validateConnection(PoolableConnectionFactory.java:331) at org.apache.commons.dbcp.PoolableConnectionFactory.validateObject(PoolableConnectionFactory.java:312) at org.apache.commons.pool.impl.GenericObjectPool.evict(GenericObjectPool.java:1217) locked <0x0000002a9ee91bf8> (a org.apache.commons.pool.impl.GenericObjectPool) at org.apache.commons.pool.impl.GenericObjectPool$Evictor.run(GenericObjectPool.java:1341) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) "SocketRequest-8": at org.apache.commons.pool.impl.GenericObjectPool.addObjectToPool(GenericObjectPool.java:1137) waiting to lock <0x0000002a9ee91bf8> (a org.apache.commons.pool.impl.GenericObjectPool) at org.apache.commons.pool.impl.GenericObjectPool.returnObject(GenericObjectPool.java:1076) at org.apache.commons.dbcp.PoolableConnection.close(PoolableConnection.java:87) locked <0x0000002aa04d4be8> (a org.apache.commons.dbcp.PoolableConnection) at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.close(PoolingDataSource.java:181) at my.package.DbUtil.close(DbUtil.java:545) at my.package.DbUtil.close(DbUtil.java:587) at my.package.Request.closeConnection(NZTABRequest.java:970) at my.package.SocketRequest.processRequest(SocketRequest.java:57) at my.package.SocketRequest.processRequest(SocketRequest.java:110) at my.package.SocketRequest.waitForWork(SocketRequest.java:82) locked <0x0000002a9ed88ef0> (a my.package.SocketRequest) at my.package.SocketRequest.run(SocketRequest.java:71) (Sorry for all the edits!)
          David Rees made changes -
          Attachment evictor-close-deadlock-stacktrace.txt [ 12376685 ]
          David Rees made changes -
          Attachment evictor-close-deadlock-stacktrace.txt [ 12376685 ]
          Hide
          David Rees added a comment -

          It's funny, I've been using DBCP for years without having run across this issue once, and now I've had two separate applications run across the same deadlock within a week.

          I have another deadlock which is identical to the one above, except that the thread closing the connection is a Servlet running in Tomcat, but the cause is the same.

          The Evictor thread runs, locks the GenericObjectPool, enters validateConnection and then attempts to lock a Poolable Connection.
          Another thread Locks a PoolableConection, and attempts to lock the GenericObjectPool.

          I've been analyzing the deadlock and it appears that the initial suggested workaround of setting minIdle to 0. In my previous comment, the application did have a minIdle setting of 1. However, in my latest deadlock, minIdle was set to zero.

          However, in both cases, the testWhileIdle is to true which appears to cause the Evictor to run.

          So to really avoid this issue, it appears that you should set minIdle to 0 AND turn off testWhileIdle. Or is minIdle really enough (and I am missing something?)

          Show
          David Rees added a comment - It's funny, I've been using DBCP for years without having run across this issue once, and now I've had two separate applications run across the same deadlock within a week. I have another deadlock which is identical to the one above, except that the thread closing the connection is a Servlet running in Tomcat, but the cause is the same. The Evictor thread runs, locks the GenericObjectPool, enters validateConnection and then attempts to lock a Poolable Connection. Another thread Locks a PoolableConection, and attempts to lock the GenericObjectPool. I've been analyzing the deadlock and it appears that the initial suggested workaround of setting minIdle to 0. In my previous comment, the application did have a minIdle setting of 1. However, in my latest deadlock, minIdle was set to zero. However, in both cases, the testWhileIdle is to true which appears to cause the Evictor to run. So to really avoid this issue, it appears that you should set minIdle to 0 AND turn off testWhileIdle. Or is minIdle really enough (and I am missing something?)
          Hide
          David Rees added a comment -

          I've been commenting on the wrong bug it appears, my deadlock case appears to be the same as DBCP-65 (which is "fixed"?)

          Show
          David Rees added a comment - I've been commenting on the wrong bug it appears, my deadlock case appears to be the same as DBCP-65 (which is "fixed"?)
          Hide
          Phil Steitz added a comment -

          This is a new issue, though the basic problem of contention between the Evictor and client threads is the same. Turning off testWhileIdle or just setting timeBetweenEvictionRuns to the default (-1) so the evictor does not run should prevent this.

          Looks to me like what is going on here is

          1) Client thread ("SocketRequest-8") calls close on a connection from the pool. This locks the PoolableConnection.
          2) The returned connection gets added back into the pool
          3) The evictor starts and locks the pool
          4) The evictor tries to validate the newly returned connection, but needs to acquire its lock
          5) returnObject (actually addObjectToPool) needs to acquire the pool lock to complete
          **deadlock**

          This is new because pool 1.4 reduced synchronization scope in returnObject, which used to be fully synchronized (opening the door for 3). We could revert to full synchronization of returnObject in pool, but that was a source of performance problems and I think the root cause here is really how DBCP uses the GOP Evictor without controlling client locks and we need to find a solution for that problem.

          Reverting to pool 1.3 will also eliminate this particular deadlock.

          Show
          Phil Steitz added a comment - This is a new issue, though the basic problem of contention between the Evictor and client threads is the same. Turning off testWhileIdle or just setting timeBetweenEvictionRuns to the default (-1) so the evictor does not run should prevent this. Looks to me like what is going on here is 1) Client thread ("SocketRequest-8") calls close on a connection from the pool. This locks the PoolableConnection. 2) The returned connection gets added back into the pool 3) The evictor starts and locks the pool 4) The evictor tries to validate the newly returned connection, but needs to acquire its lock 5) returnObject (actually addObjectToPool) needs to acquire the pool lock to complete ** deadlock ** This is new because pool 1.4 reduced synchronization scope in returnObject, which used to be fully synchronized (opening the door for 3). We could revert to full synchronization of returnObject in pool, but that was a source of performance problems and I think the root cause here is really how DBCP uses the GOP Evictor without controlling client locks and we need to find a solution for that problem. Reverting to pool 1.3 will also eliminate this particular deadlock.
          Hide
          David Rees added a comment -

          Phil, thanks for the comment. Should I open a new bug with this particular deadlock scenario? Pool 1.4 could explain why I'm hitting it recently, I've only started using Pool 1.4 in the past month or so.

          Show
          David Rees added a comment - Phil, thanks for the comment. Should I open a new bug with this particular deadlock scenario? Pool 1.4 could explain why I'm hitting it recently, I've only started using Pool 1.4 in the past month or so.
          Hide
          Phil Steitz added a comment -

          The new deadlock is definitely due to pool 1.4. Lets keep tracking the DBCP issue here, since the basic problem is the same. I will open a separate issue against pool.

          Show
          Phil Steitz added a comment - The new deadlock is definitely due to pool 1.4. Lets keep tracking the DBCP issue here, since the basic problem is the same. I will open a separate issue against pool.
          Phil Steitz made changes -
          Link This issue is blocked by POOL-125 [ POOL-125 ]
          Hide
          Phil Steitz added a comment -

          The last deadlock above should be resolved by the fix applied in r 672097 for DBCP-270.

          Show
          Phil Steitz added a comment - The last deadlock above should be resolved by the fix applied in r 672097 for DBCP-270 .
          Hide
          Mark Thomas added a comment -

          I have attached a possible solution to this issue that removes the sync from addObject()

          The upside is it should no longer be possible for the evcitor to be the source of a deadlock.

          The downside is a pool where (idle==0 && (active + 1)==maxActive && minIdle>0) may appear to be exhausted to a client if the evictor thread is in the process of creating the last object when the client calls borrowObject(). This is really only an issue if the pool is configured with WHEN_EXHAUSTED_FAIL.

          Assuming my patch is correct, the question becomes is it better to have the risk of a deadlock or better to have the risk of the edge case described above. My own view is that the deadlock is worse.

          Thoughts?

          Show
          Mark Thomas added a comment - I have attached a possible solution to this issue that removes the sync from addObject() The upside is it should no longer be possible for the evcitor to be the source of a deadlock. The downside is a pool where (idle==0 && (active + 1)==maxActive && minIdle>0) may appear to be exhausted to a client if the evictor thread is in the process of creating the last object when the client calls borrowObject(). This is really only an issue if the pool is configured with WHEN_EXHAUSTED_FAIL. Assuming my patch is correct, the question becomes is it better to have the risk of a deadlock or better to have the risk of the edge case described above. My own view is that the deadlock is worse. Thoughts?
          Mark Thomas made changes -
          Attachment DBCP-44-possible-solution.patch.txt [ 12403604 ]
          Hide
          Phil Steitz added a comment -

          Agree that if the patch works as designed, it should be applied. I am out of pocket for next few days but will analyze and test to verify. Need to make sure removing synch from addObject does not create possibility of maxActive exceeded.

          Show
          Phil Steitz added a comment - Agree that if the patch works as designed, it should be applied. I am out of pocket for next few days but will analyze and test to verify. Need to make sure removing synch from addObject does not create possibility of maxActive exceeded.
          Hide
          Mark Thomas added a comment -

          If the above patch is viewed as OK then similar patches will be required for the other pools to fully resolve this issue.

          Show
          Mark Thomas added a comment - If the above patch is viewed as OK then similar patches will be required for the other pools to fully resolve this issue.
          Hide
          Phil Steitz added a comment -

          I have reviewed and tested the patch and am +1 for applying it and resolving this issue, with closure pending pool 1.5 release.

          The change in AbandonedTrace synchronization scope made to address DBCP-270 should eliminate the second deadlock scenario reported in this issue and the first one should be prevented by Mark's patch. The unit test attached above, TestConcurrency, consistently deadlocks under JDK 1.5 before Mark's pool patch, but succeeds with a pool jar patched with Mark's patch. I am not sure why exactly, but neither the maven nor ant runner deadlocks on this test pre-patch under 1.6 (Apple JDK, at least).

          I agree that GenericKeyedObjectPool needs to be similarly patched.

          I agree with Dain's comment on "lock holding open outcall..bound to cause deadlocks" and opened POOL-125 to address the general case.

          Show
          Phil Steitz added a comment - I have reviewed and tested the patch and am +1 for applying it and resolving this issue, with closure pending pool 1.5 release. The change in AbandonedTrace synchronization scope made to address DBCP-270 should eliminate the second deadlock scenario reported in this issue and the first one should be prevented by Mark's patch. The unit test attached above, TestConcurrency, consistently deadlocks under JDK 1.5 before Mark's pool patch, but succeeds with a pool jar patched with Mark's patch. I am not sure why exactly, but neither the maven nor ant runner deadlocks on this test pre-patch under 1.6 (Apple JDK, at least). I agree that GenericKeyedObjectPool needs to be similarly patched. I agree with Dain's comment on "lock holding open outcall..bound to cause deadlocks" and opened POOL-125 to address the general case.
          Hide
          Mark Thomas added a comment -

          Just discovered that my proposed patch also fixes POOL-107. Since this is really a pool issue it makes more sense to track progress there rather than here.

          Show
          Mark Thomas added a comment - Just discovered that my proposed patch also fixes POOL-107 . Since this is really a pool issue it makes more sense to track progress there rather than here.
          Mark Thomas made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Phil Steitz made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Eric Layton
            • Votes:
              4 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development