Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0, 2.1, 2.2, 2.3
-
None
Description
We ran into this bug via a Redis library that uses a GenericKeyedObjectPool for connection pooling. We found that over the course of several days, the connections available to our application were dwindling.
Some relevant configuration:
testWhileIdle: false
numTestsPerEvictionRun: -1
minEvictableIdleTimeMillis: 60000
timeBetweenEvictionRunsMillis: 30000
maxTotalPerKey 400
maxIdlePerKey 400
(In a more minimal repro case I developed later, the problem happens much faster if minEvictableIdleTimeMillis and timeBetweenEvictionRunsMillis are reduced greatly, even down to 1.)
We discovered that this is what happens (looking at 2.3 code, but the problem occurred with 2.2 and 2.3):
- In evict(), there is a variable idleObjects which starts as null
- The branch where idleObjects is set is not necessarily taken
- The null idleObjects is passed to underTest.endEvictionTest(idleObjects)
- If the object underTest had previously been borrowed and was thus set to the EVICTION_RETURN_TO_HEAD state, then endEvictionTest throws a NPE
- By default this is silently swallowed and the evictor continues on
- Now the object that had been underTest is set to state IDLE, but is not in the pool's idleObjects list, so it is lost
If it would help, I can clean up my repro program and attach that, but it is written in Clojure, not Java. I can also try to write a Java repro but I might not have time to do that for a little while.
This bug can be avoided by disabling the evictor. That doesn't help us, though, because we rely on background eviction to avoid running into server-side connection timeouts.