Issue Details (XML | Word | Printable)

Key: POOL-122
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Marcus Schulte
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Commons Pool

java.util.Timer in EvictionTimer does not recover from OutOfMemoryError in Evictor

Created: 02/Feb/08 04:01 PM   Updated: 19/Jun/09 04:13 PM
Return to search
Component/s: None
Affects Version/s: 1.3, 1.4
Fix Version/s: 1.5

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works fixes_POOL-122.patch 2008-02-02 06:50 PM Marcus Schulte 4 kB
Environment: any

Resolution Date: 01/Apr/09 04:46 PM


 Description  « Hide
GenericKeyedObjectPool.Evictor.run() catches and ignores Exceptions, but not Errors, like OOME. Consequently, when, due to load-peeks an OOME is thrown in the evictor's timer-thread it dies miserably and no eviction will happen again in any of the pools loaded within the same class-loader (because the eviction timer is a static member). Also, the creation of evicting pools will fail with IllegalStateException.

Possible fixes:
1. catch Throwable in GenericKeyedObjectPool.Evictor.run()
2. check and eventually re-instantiate the Eviction-Timer.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Mark Thomas added a comment - 02/Feb/08 06:49 PM
Once there has been an OOME the only safe course of action is to shut the JVM down, assuming you have enough control left to be able to do that. Swallowing the OOME will make it impossible to detect that the system needs to shut down, so I am against any such change.

Marcus Schulte added a comment - 02/Feb/08 06:50 PM
Added a patch implementing solution 1 plus unit test.

Marcus Schulte added a comment - 02/Feb/08 07:12 PM
I can see your argument, but still want to put forward that
  • If you encounter an OOME in one thread you will very probably also see it in others, among them, for example, the main worker threads of the container (or your main-thread).
  • All containers I know log the error, if it occurs in a thread controlled by them, and continue to work as well as they can.
  • The current implementation silently swallows Exception which is not that much better when it comes to error diagnostics.
  • Instead, one could catch Throwable and log to stderr - which is basically what the vm does anyway – minus terminating the evictor-thread.

Phil Steitz added a comment - 02/Feb/08 08:31 PM
I agree with Mark that swallowing Throwable here is not a good idea, so am also -1 on the patch.

Assigning fix version 1.5 and leaving open because we should log the error.

Also affects 1.4, since behavior there is the same. One more comment I have on this is that while we need to and will continue to support the Evictor, heavily loaded applications should be careful depending on it as a pseudo-garbage collector. OOME's could indicate resource leaks and these should be investigated and addressed directly (assuming that is contributing to the problem).


Mark Thomas added a comment - 01/Apr/09 03:00 PM
This has been fixed by logging the throwable (to stderr) and trying to carry on. Note that exceptions are still silently swallowed.

Joerg Schaible added a comment - 01/Apr/09 04:39 PM
I am -1 for the commited solution either. There's no real difference. You may never catch a ThreadDeath. All we should do is to handle OOME instead of the Throwable and try to recover from that. Anything else is really bad.

Mark Thomas added a comment - 01/Apr/09 04:46 PM
Scope reduced as requested. I'm getting close to pulling the whole patch and marking this as WONTFIX as it is, in my view, a generally pointless exercise.

Joerg Schaible added a comment - 01/Apr/09 09:19 PM
Well, handling OOME might be really useful. It is not of help, if you have a memory leak, but sometimes a user simply tries to process a really big document, image or something similar and the system recovers immediately from memory shortage as soon as the silly process has been stopped. Thanks for changing the implementation agian, though.

Phil Steitz added a comment - 19/Jun/09 04:13 PM
Fixed in 1.5 release.