Issue Details (XML | Word | Printable)

Key: POOL-86
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Phil Steitz
Reporter: Mike Martin
Votes: 1
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Commons Pool

GenericKeyedObjectPool retaining too many idle objects

Created: 10/Oct/06 11:01 PM   Updated: 08/Dec/07 08:52 AM
Return to search
Component/s: None
Affects Version/s: 1.3
Fix Version/s: 2.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works pool-86.patch 2006-10-30 08:15 PM Mike Martin 2 kB
Text File Licensed for inclusion in ASF works pool-86.withtest.patch 2006-10-30 09:07 PM Mike Martin 4 kB

Resolution Date: 07/Oct/07 12:09 AM


 Description  « Hide
There are two somewhat related problems in GenericKeyedObjectPool that cause
many more idle objects to be retained than should be, for much longer than they
should be.

Firstly, borrowObject() is returning the LRU object rather than the MRU object.
That minimizes rather than maximizes object reuse and tends to refresh all the
idle objects, preventing them from becoming evictable.

The idle LinkedList is being maintained with:

borrowObject: list.removeFirst()
returnObject: list.addLast()

These should either both be ...First() or both ...Last() so the list maintains
a newer-to-older, or vice-versa, ordering. The code in evict() works from the
end of the list which indicates newer-to-older might have been originally
intended.

Secondly, evict() itself has a couple of problems, both of which only show up
when many keys are in play:

1. Once it processes a key it doesn't advance to the next key.

2. _evictLastIndex is not working as documented ("Position in the _pool where
the _evictor last stopped"). Instead it's the position where the last scan
started, and becomes the position at which it attempts to start scanning
in the next pool. That just causes objects eligible for eviction to
sometimes be skipped entirely.

Here's a patch fixing both problems:

GenericKeyedObjectPool.java
990c990
< pool.addLast(new ObjectTimestampPair(obj));

> pool.addFirst(new ObjectTimestampPair(obj));
1094,1102c1094,1095
< }
<
< // if we don't have a keyed object pool iterator
< if (objIter == null) {
< final LinkedList list = (LinkedList)_poolMap.get(key);
< if (_evictLastIndex < 0 || _evictLastIndex > list.size()) { < _evictLastIndex = list.size(); < }
< objIter = list.listIterator(_evictLastIndex);

> LinkedList list = (LinkedList)_poolMap.get(key);
> objIter = list.listIterator(list.size());
1154,1155c1147
< _evictLastIndex = -1;
< objIter = null;

> key = null;
1547,1551d1538
<
< /**
< * Position in the _pool where the _evictor last stopped.
< */
< private int _evictLastIndex = -1;

I have a local unit test for this but it depends on some other code I can't
donate. It works like this:

1. Fill the pool with _maxTotal objects using many different keys.
2. Select X as a small number, e.g. 2.
3. Compute:
maxEvictionRunsNeeded = (maxTotal - X) / numTestsPerEvictionRun + 2;
maxEvictionTime = minEvictableIdleTimeMillis + maxEvictionRunsNeeded * timeBetweenEvictionRunsMillis;
4. Enter a loop:
a. Borrow X objects.
b. Exit if _totalIdle = 0
c. Return the X objects.

Fail if loop doesn't exit within maxEvictionTime.

Mike



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Sandy McArthur added a comment - 30/Oct/06 04:18 AM
> Firstly, borrowObject() is returning the LRU object rather than the MRU object. That minimizes rather than maximizes object reuse and tends to refresh all the idle objects, preventing them from becoming evictable.

I think you have minimize and maximize backwards and reusing idle objects is why you'd want to use a pool. There are a couple of reasons for GKOP preferring the LRU idle object:

1. The reason to use a pool is that you have reusable objects that are expensive to create or dispose of. Using a FIFO ordering (LRU) means you use all the pooled objects over time instead of just a handful. Idle object will be kept fresh and ready to reuse or they will fail validateObject and be discarded. LIFO ordering (MRU) will tend to reuse the same objects and others will rarely be used under peak load, unless you have an idle object evictor to check their reusability ...

2. Using an idle object evictor can cause deadlocks. If the PoolableObjectFactory uses synchronization and the pool is accessed in a synchronized context it can lead to locks being acquired in the opposite order which creates a deadlock possible race condition and thread-safty cannot be guaranteed. Basically, avoid idle object eviction if you can.

3. A FIFO pool is implicitly optimized to be prepared to handle the next load spike. And that is the point of the pool. Optimizing the pool for low load levels reduces the important advantage of using a pool, being prepared for lots of work.

I can think of situations where the above doesn't apply but "Generic" pools are named "Generic" because they aren't optimized for any specific situation.

> Secondly, evict() itself has a couple of problems, both of which only show up when many keys are in play:

> Here's a patch fixing both problems:

If you could work the patch to only address the evict issue and attach it as a file instead of a copy/paste in it's own issue I'd appreciate it.


Sandy McArthur made changes - 30/Oct/06 04:18 AM
Field Original Value New Value
Assignee Sandy McArthur [ sandymac ]
Mike Martin added a comment - 30/Oct/06 05:52 PM
Sandy wrote:
> I think you have minimize and maximize backwards and reusing idle objects is
> why you'd want to use a pool. There are a couple of reasons for GKOP
> preferring the LRU idle object:
>
> 1. The reason to use a pool is that you have reusable objects that are
> expensive to create or dispose of. Using a FIFO ordering (LRU) means you use
> all the pooled objects over time instead of just a handful. Idle object will
> be kept fresh and ready to reuse or they will fail validateObject and be
> discarded. LIFO ordering (MRU) will tend to reuse the same objects and others
> will rarely be used under peak load, unless you have an idle object evictor to
> check their reusability ...

I think you're omitting a key point: the reason to use a pool is not just to
avoid object creation expense; it's also to minimize the number of objects
needed to satisfy demand. At the least, that is very true of the primary use
case I know of for pooling, DB connection pooling.

DB connections are a very expensive resource to lie around unused (or underused)
indefinitely. They are usually associated with an entire process on a DB
server. My company began reworking its pooling code using commons-pool
specifically to gain control over an exploding # of idle connections as the # of
DB users and # of servers grew (I'm talking hundreds of users and thousands of
connections).

So we can't simply say "avoid idle object eviction"; for many folks it's
absolutely necessary. And if the feature is there, it needs to work.

Idle eviction can only work properly using a LIFO list. Consider again the unit
test I described. It's not an exaggerated case. If I experience a momentary
load spike that opens, say, 100 concurrent connections, and it's followed by a
persistent "normal" load that needs only 2, I absolutely cannot continually
consume 100 connections thereafter when 2 would suffice. The FIFO list only
allows idleness to be detected if demand drops all the way to zero. Otherwise
it becomes impossible for the pool to realize that only 2 connections are now
sufficient; the 2 concurrent users keep cycling around all 100 connections, and
each of the 100 sees very little work.

In addition, consider that a LIFO list still works correctly if the user is
not using idle eviction. But the reverse is not true; FIFO ruins things when
idle eviction is needed.

There is no such thing as keeping idle objects "fresh". By definition, every
object in a pool is equivalent and interchangeable. You're right that LIFO
will tend to reuse the same objects, but that's the whole point! That is how
a pool tracks the demand being placed on it.

I've done quite a bit of analysis of GKOP's behavior in both cases, before and
after this fix, in a real-world production scenario. Idle eviction simply *does
not work* with a FIFO list unless demand drops to zero. Otherwise the pool
rises to its high-water mark and stays there.

So I hope you'll reconsider fixing this, and I'll attach the patch as a file for
you.

Thanks,

Mike


Mike Martin added a comment - 30/Oct/06 08:15 PM
Here's a copy of the patch as applied against the current trunk sources.

Mike Martin made changes - 30/Oct/06 08:15 PM
Attachment pool-86.patch [ 12343956 ]
Mike Martin added a comment - 30/Oct/06 09:07 PM
I've attached another patch which includes the necessary changes to the unit test.

Mike Martin made changes - 30/Oct/06 09:07 PM
Attachment pool-86.withtest.patch [ 12343958 ]
Sandy McArthur added a comment - 31/Oct/06 06:36 AM
I think the changes you suggest to the evict method will fail to make forward progress if getNumTests() is less than the number of idle objects for the current key. This is why the _evictLastIndex was needed. Could you rework the patch taking in to account small values of _numTestsPerEvictionRun and attach it to a seperate issue so that each issue only tracks one bug at a time.

Sandy McArthur added a comment - 31/Oct/06 07:00 AM
Back to the first part of your original submission: What I understand you really want is a way to prune the pool size down after a peak load spike.

Would a decorator that either occasionally discards returned objects or uses some heuristics to determine when to discard returned objects meet your needs? This could be implemented so that it doesn't need an eviction thread which means we can guarantee thread-safety of the pool implementation but it would require some pool activity to do it's work.


Mike Martin added a comment - 31/Oct/06 04:44 PM
Sandy wrote:
> I think the changes you suggest to the evict method will fail to make forward
> progress if getNumTests() is less than the number of idle objects for the
> current key. This is why the _evictLastIndex was needed.

_recentlyEvictedKeys can serve the same purpose. This change would do that:

— GenericKeyedObjectPool.java.prev Tue Oct 31 10:24:44 2006
+++ GenericKeyedObjectPool.java Tue Oct 31 10:25:06 2006
@@ -1207,6 +1207,7 @@
return;
}
key = keyIter.next();
+ _recentlyEvictedKeys.add(key);
LinkedList list = (LinkedList)_poolMap.get(key);
objIter = list.listIterator(list.size());
}
@@ -1259,7 +1260,6 @@
}
} else { // else done evicting keyed pool - _recentlyEvictedKeys.add(key); key = null; }
}

> Back to the first part of your original submission: What I understand you
> really want is a way to prune the pool size down after a peak load spike.
>
> Would a decorator that either occasionally discards returned objects or uses
> some heuristics to determine when to discard returned objects meet your needs?
> This could be implemented so that it doesn't need an eviction thread which
> means we can guarantee thread-safety of the pool implementation but it would
> require some pool activity to do it's work.

I think what I want, and what most DB connection pools need, is the existing
idle eviction facility as advertised.

I've never encountered the thread-safety problems you mention. Is there an
outstanding bug regarding that? If so, it's not occurring in my environment
which as I said involves hundreds of keys (DB users), thousands of connections,
and in fact is used by hundreds of threads.


Sandy McArthur added a comment - 31/Oct/06 11:51 PM
> I think what I want, and what most DB connection pools need, is the existing
> idle eviction facility as advertised.

I can appreciate that you probably think I'm just being bone headed. But idle object eviction as implemented by pool runs the risk of deadlocks and it cannot be fixed without breaking backwards compatibility. When considering only one usage of pool it is possible to use the evictor in a thread-safe manner but it would be irresponsible of me to encourage use of a risky pool feature when the desired results can be implemented in a manner that is thread-safe in every use case.

I'm trying to provide a solution that will meet your needs and be deadlock proof for everyone else at the same time. If you don't want to work with me on this then you are free to maintain your own version of a pool under the terms of the ASL-2.0.

> I've never encountered the thread-safety problems you mention. Is there an
> outstanding bug regarding that?

DBCP-65 is one example: https://issues.apache.org/jira/browse/DBCP-65

The problem is locks are acquired in reverse order. If there are two locks, ClientLock and PoolLock and the ClientLock is aquired by code using pool and by the PoolableObjectFactory provided to the pool you run the risk of deadlock when using an evitor.

Normal usage will acquire ClientLock, call a pool method which will acquire the internal PoolLock, and then the pool will call the PoolableObjectFactory which also acquires the ClientLock again. Effectively the lock acquistion order is ClientLock then PoolLock.

A pool with an evictor thread won't know about the ClientLock. After it wakes up to do idle object eviction it will acquire the PoolLock and the call the PoolableObjectFactory which will acquire the ClientLock. In the case the lock acquisition is PoolLock then ClientLock.

Acquiring locks in reverse order is the most basic example of how to cause a deadlock in any text on concurrency.

> If so, it's not occurring in my environment
> which as I said involves hundreds of keys (DB users), thousands of connections,
> and in fact is used by hundreds of threads.

I'm not privy to your code. I don't know if your code is thread-safe or just lucky. Just because pool works in a particular setup doesn't change the fact that idle object eviction is risky and should be avoided.

> _recentlyEvictedKeys can serve the same purpose. This change would do that:

Almost, this has the same problem of possibly skipping objects eligible for eviction which is the same problem we have now mentioned in item #2 of your original post.


Mike Martin added a comment - 01/Nov/06 03:36 AM
If you think through it, you'll realize that the lock ordering vulnerability
you're describing is not a function of there being an idle evictor thread. It
comes from the fact that there's a user callback (the factory) in play in the
first place.

If the locking policy of the pool is clearly documented (factory is called
while pool is locked), then the deadlock scenario you described is not
something the pool itself can avoid; that client's approach to the use of his
ClientLock is provably wrong to begin with.

Sounds like we'll need to maintain our own pool class. Thanks anyway.

Mike


Sandy McArthur added a comment - 13/Nov/06 01:41 AM
Committed some Poolutils.erodingPool decorators which addresses the issue of need a means to shrink the size of a FIFO ordered pool during low workloads. See: http://svn.apache.org/viewvc?view=rev&revision=474109

Sandy McArthur made changes - 13/Nov/06 01:41 AM
Fix Version/s Nightly Builds [ 12311772 ]
Resolution Fixed [ 1 ]
Status Open [ 1 ] Resolved [ 5 ]
Henri Yandell made changes - 18/Feb/07 05:40 PM
Fix Version/s Nightly Builds [ 12311772 ]
Fix Version/s 2.0 [ 12311985 ]
Rob Eamon added a comment - 04/Apr/07 06:33 PM
With respect to all the constraints on what can be done to the code, the resolution to this issue feels like a hack to me.

From Sandy's earlier post on Oct 29:

[quote]
1. The reason to use a pool is that you have reusable objects that are expensive to create or dispose of. Using a FIFO ordering (LRU) means you use all the pooled objects over time instead of just a handful. Idle object will be kept fresh and ready to reuse or they will fail validateObject and be discarded. LIFO ordering (MRU) will tend to reuse the same objects and others will rarely be used under peak load, unless you have an idle object evictor to check their reusability ...

2. Using an idle object evictor can cause deadlocks. If the PoolableObjectFactory uses synchronization and the pool is accessed in a synchronized context it can lead to locks being acquired in the opposite order which creates a deadlock possible race condition and thread-safty cannot be guaranteed. Basically, avoid idle object eviction if you can.

3. A FIFO pool is implicitly optimized to be prepared to handle the next load spike. And that is the point of the pool. Optimizing the pool for low load levels reduces the important advantage of using a pool, being prepared for lots of work.
[/quote]

1. We all agree on the purpose of the pool--to have idle objects ready to go when they are needed. Often it is also necessary to limit the number of objects in the pool. Where there is disagreement is on how the pool should behave.

If initial performance under peak load is paramount (want to have enough idle objects on hand), then the existing configuration parameters permit doing that. One sets the minIdle value and the testWhileIdle appropriately and one can assure having enough pooled objects to handle periodic usage bursts.

However, if one wants the pool to grow and shrink according to usage and wants the number of idle objects to quiesce when usage subsides (e.g. no sense consuming a client license when it isn't being used), then there is little that can be done if the pool is FIFO. As Mike points out, one or two threads could keep all pool objects "fresh" even when they aren't needed any longer.

2. Use of an evictor thread can definitely cause deadlocks--if the factory uses client locks. If it doesn, then it doesn't matter. And as Mike points out, if the programmer knows the behavior of the pool, then they can presumably design around that to avoid deadlock.

Are you saying that the evictor thread should never have been implemented? Is it going to be deprecated? What would you do as an alternative, if you had a clean slate?

3. Having the pool large enough to handle peak load is but one of many constraints that the pool concept is designed to address. As mentioned above, if one needs a minimum pool size at the ready for peak loads this can be managed using configuration. But if one doesn't need that and insteads needs the pool to grow and shrink using the max/min sizes and idle timeouts then the current implementation inhibits shrinking of the pool.

Wrapping the pool classes with the eroding class seems like a stop-gap added to address something that should be part of the core pool implementation. While throwing away the object being returned vs throwing away an idle and expired object from the pool is a wash, it seems useful for the pool implementation to have some sort of mechanism to purge expired objects from the pool based simply on idle time. How that is implemented, I don't have much preference, but the current implementation clearly needs a tweak.

I respectfully request that this solution be reconsidered.


Phil Steitz added a comment - 21/Jun/07 10:44 PM
I agree with Rob that a better solution should be provided - either make GenericObjectPool's LRU/MRU behavior configurable or provide an alternative MRU-based impl. Robust alternatives are available in the compositepool package, so one resolution is to close this on release of pool 2.0, but I would like to consider addressing this in the 1.x branch as well.

Phil Steitz made changes - 21/Jun/07 10:44 PM
Status Resolved [ 5 ] Reopened [ 4 ]
Resolution Fixed [ 1 ]
Assignee Sandy McArthur [ sandymac ] Phil Steitz [ psteitz ]
Repository Revision Date User Message
ASF #582538 Sat Oct 06 22:17:41 UTC 2007 psteitz Fixes to address POOL-86
* Made LIFO/FIFO behavior configurable for GenericObjectPool and
  GenericKeyedObjectPool, with default set back to LIFO
  (reverting to 1.2 behavior)
* Fixed GOP, GKOP evict method and added tests to ensure objects
  are visited in oldest-to-youngest order
* Changed backing store for GOP, GKOP pools back to Commons Collections
  CursorableLinkedList (brought this class in, repackaged with
  package scope).
JIRA: POOL-86
Files Changed
MODIFY /commons/proper/pool/trunk/src/test/org/apache/commons/pool/impl/TestGenericKeyedObjectPool.java
ADD /commons/proper/pool/trunk/src/test/org/apache/commons/pool/VisitTracker.java
MODIFY /commons/proper/pool/trunk/src/java/org/apache/commons/pool/impl/GenericKeyedObjectPool.java
ADD /commons/proper/pool/trunk/src/test/org/apache/commons/pool/VisitTrackerFactory.java
MODIFY /commons/proper/pool/trunk/src/java/org/apache/commons/pool/impl/GenericObjectPool.java
MODIFY /commons/proper/pool/trunk/xdocs/changes.xml
MODIFY /commons/proper/pool/trunk/src/test/org/apache/commons/pool/impl/TestGenericObjectPool.java
ADD /commons/proper/pool/trunk/src/java/org/apache/commons/pool/impl/CursorableLinkedList.java

Phil Steitz added a comment - 07/Oct/07 12:09 AM
Fixed for GenericObjectPool, GenericKeyedObjectPool in r582538.

Phil Steitz made changes - 07/Oct/07 12:09 AM
Resolution Fixed [ 1 ]
Status Reopened [ 4 ] Resolved [ 5 ]
Mark Thomas added a comment - 08/Dec/07 08:52 AM
I've looked through the code and the patch looks good to me.