Under sustained high loads (100-200 requests per sec) over 12+ hour periods we were regularly hitting a situation where all of the serving threads would be blocked except for one thread looping infinitely in org.apache.naming.resources.ResourceCache#allocate(). After quite a bit of effort, we tracked this down to a number of minor bugs in the ResourceCache code, two having to do with the cacheSize accounting, and one to do with an off-by-one error in the range of the random generation of an index to consider for freeing. We've created a local patch, and we've verified that this fixes the issue for us, actually monitoring the correctness of the cacheSize accounting. I will be attaching a patched source for ResourceCache.java. Please consider incorporating these fixes into subsequent revisions. The bug appears both in Tomcat 5.0.30 and the latest trunk code. Anil Gangolli (anil@busybuddha.org / gangolli@apache.org)
Created attachment 16353 [details] Patched version of ResourceCache.java (full text) This file contains the full source including the patches. Each patch has a comment describing it. It should be evident using diff. If you need anything else, or further explanation, please let me know.
This seems like a good patch, but you should submit diffs rather than full files.
Thanks. I'll submit diffs as well tomorrow. I'm not near the code presently.
(In reply to comment #3) > Thanks. I'll submit diffs as well tomorrow. I'm not near the code presently. That should be ok, it's just that it's much easier to work with.
I applied the patch. Thanks a lot since it would have been impossible to debug without knowing the exact usage. The case where insertCache "fails" because the entry is already present is a race condition on allocate and insert, so it would be best if another lookup under sync should be performed before trying (to avoid the uneeded allocate): Index: ProxyDirContext.java =================================================================== RCS file: /home/cvs/jakarta-tomcat-catalina/catalina/src/share/org/apache/naming/resources/ProxyDirContext.java,v retrieving revision 1.18 diff -u -r1.18 ProxyDirContext.java --- ProxyDirContext.java 20 Jul 2005 21:25:18 -0000 1.18 +++ ProxyDirContext.java 12 Sep 2005 10:43:35 -0000 @@ -1596,7 +1596,7 @@ // Add new entry to cache synchronized (cache) { // Check cache size, and remove elements if too big - if (cache.allocate(entry.size)) { + if ((cache.lookup(name) == null) && cache.allocate(entry.size)) { cache.load(entry); } }
Thanks for the rapid response. Will this fix be present in any future Tomcat 5.0.x releases as well? Any chance of getting this?