We have detected a situation where Tomcat will leak memory when web applications are undeployed. Specifically, when the following 3 conditions are met: a) SingleSignOn valve is used b) the web app requires the user to be authenticated to get a session c) the web app contains commons-logging.jar *and* log4j.jar This has been observed with Tomcat 5.5.7 Simply deploy a web app under the above conditions, visit the web app so that a session for that app is stored in the SingleSignOn data structures, then undeploy the web app. At this point tomcat leaks a reference to the class loader for the undeployed web app. Repeat, and eventually this will cause OutOfMemoryError. What happens is that an instance of StandardSession is stored as a key in the "reverse" HashMap of SingleSignOn. The instance of StandardSession also refers to an instance of StandardManager in its "manager" field. The classes for the SingleSignOn, StandardSession and StandardManager all appear to be loaded by the "catalina" StandardClassLoader. However, (and I think this is where the leak happens), the "log" field of the StandardManager refers to an instance of: org.apache.commons.logging.impl.Log4JLogger That Log4JLogger class is loaded by the WebAppClassLoader for our web app, hence making a hard reference to that class loader. When the web application is undeployed, the hard reference to the dead classloader remains in the data structures of SingleSignOn. I think it only gets cleaned up when the session is terminated. This has lead to exhaustion of the PermGen for our applications, where it is quite normal to create/deploy/undeploy web applications in production (not just during development). I can supply a very simple web-app to demonstrate this effect if required. Not sure if this is one bug, or several: 1) Why is the logger loaded from the web app class loader? 2) Why does the SingleSignOn mechanism not remove ALL sessions for ALL users related to a given web app when the web app is undeployed. Any ideas for fixes for this? (Its seriously delaying a product release at present.) Many thanks, Kev
You might want to look into the code a little bit more, and see that SSO should get an event for all sessions of a particular webapp when it is stopped. If you think there are cases where something is not properly cleaned up, then please submit a patch.
(In reply to comment #1 from Remy) > You might want to look into the code a little bit more, and see that SSO should > get an event for all sessions of a particular webapp when it is stopped. Remy, OK I'm confused... I've only just started to look at the internals of Tomcat, so its kind of hard to initially get my head around the codebase. However, I've been instrumenting a local copy of the 5.5.7 source with lots of extra logging to try to follow the logic. From what I can tell, the SingleSignOn is NOT notified in any way when an application is undeployed. The only mechanism I could see would be if it was told when the individual sessions were expired. So I had a look at session.StandardManager, which seems to deal with persisting and expiring all of the sessions when an application is STOPped. However, looking at StandardManager.doUnload(), the look that expires the sessions calls session.expire(false) - i.e. does not notify listeners, hence the SSO never gets told. Did I miss something obvious? (I am trying to look into the problem and help out here!) In the meantime, a colleague is still fighting with the loggers and class loaders, but I think we can accept that is a separate issue. Thanks, Kev
Yes, that's a very good point about the expire(false) when a webapp is stopped. I didn't think about that (to be honest, I don't know the SSO code that well either, but looking at the code I had the impression that this was working, and the problems would be caused by something else). I'll try to see if there is a way to do better cleanup, or to remove the need for holding references to sessions.
I tried the obvious quick solution of changing expire(false) to expire(true), but unfortunately that had a nasty side-effect. If the user is logged in to a SSO session, and is using several web apps, then if one of those web apps is undeployed, their *entire* SSO is destroyed. This means they are immediately logged out (of all the apps), and their next interaction with any of the other web apps will force them to log in again (which is obviously not what was intended). I'll keep digging and trying to follow the code base. (Out of interest, does all of the code have as few comments as this session/authentication stuff?) We've downgraded our upcoming product release to a beta (and use a single shared log instead of per webapp logging) until we figure a way to get the SSO to release the ref to the classloader. Thanks again, and please do let us know if you have any ideas for how to change this SSO behaviour. BTW - Is there a tomcat test suite I should be using so that I know if I have broken anything while I look at this issue? Being new to the tomcat code, I'm quite wary of breaking stuff via strange side-effects. Cheers, Kev
I had a bit more of a look at this. Again, please be aware that I am new to this code base. The only mechanism I can see for an instance of SingleSignOn to be told that a WebApp has been undeployed is if it were to be told when the sessions related to that app were destroyed (i.e. if StandardManager.doUnload() were to be changed to call expire(true) instead of expire(false)). However, the listener SingleSignOn.sessionEvent(SessionEvent) assumes that it is only ever called for one of two reasons - either the user logged out, or the session expired. But what we have is a third case - one of the apps being undeployed. Now, it seems that if we can detect the third case then the method to be called has already been implemented (if not tested)... The method SingleSignOn.deregister(ssoId, session) [note the 2nd param] already exists and looks like it is designed for the job, but I can't find anything that calls it! So the question is: How does the listener method detect the third case? I can think of hacks involving overloading the expire(notify) method with an extra param and using that to put some marker in the data field of the SessionEvent, but I can't help but think I'm missing something obvious by not knowing the code base. Ideas? Kev
> So the question is: How does the listener method detect the third case? I can > think of hacks involving overloading the expire(notify) method with an extra > param and using that to put some marker in the data field of the SessionEvent, > but I can't help but think I'm missing something obvious by not knowing the > code base. I can't find a cleaner way, but I have this work using the above technique. Should I provide a patch against the 5.5.7 src (which is what I'm using), or do patch have to be against the CVS head? I've not looked at other managers such as Persistent / clustering etc, as I don't know much about those. Kev
Created attachment 14379 [details] Fix session leak in SingleSignOn Since there was no answer to how you wanted the patch, I have a patch against 5.5.7. Its a unified diff of everything in: jakarta-tomcat-5.5.7-src\jakarta-tomcat-catalina\catalina\src\share\ Its pretty small - changes to 4 files: catalina\Session.java catalina\authenticator\SingleSignOn.java catalina\session\StandardSession.java catalina\session\StandardManager.java Regards, Kev
Now that 5.5.9 is released, I can look into more complex/risky issues. I see you have done a lot of research, which identifies the problem very clearly. However, I don't really like the proposed solution. How about simply setting the manager field to null since this is the cause of the problem ? Calling recycle on all sessions in StandardManager.stop could do this easily. Index: StandardManager.java =================================================================== RCS file: /home/cvs/jakarta-tomcat-catalina/catalina/src/share/org/apache/catalina/session/StandardManager.java,v retrieving revision 1.28 diff -u -r1.28 StandardManager.java --- StandardManager.java 7 Feb 2005 21:56:32 -0000 1.28 +++ StandardManager.java 29 Mar 2005 12:32:32 -0000 @@ -670,12 +670,16 @@ Session sessions[] = findSessions(); for (int i = 0; i < sessions.length; i++) { StandardSession session = (StandardSession) sessions[i]; - if (!session.isValid()) - continue; try { - session.expire(); + if (session.isValid()) { + session.expire(); + } } catch (Throwable t) { ; + } finally { + // Measure against memory leaking if references to the session + // object are kept in a shared field somewhere + session.recycle(); } }
Created attachment 14584 [details] Call recycle on all sessions (passivated or expired) on stop
(In reply to comment #8) Hi Remy, Thanks for coming back to this issue - the memory profile of Tomcat during repeated web application deploy/undeploy/redeploy cycles on PRODUCTION (no dev) servers is a big issue for us. Your efforts are much appreciated. > Now that 5.5.9 is released, I can look into more complex/risky issues. Is there a long gap expected then before a 5.5.10? Currently I think we're shipping a modified 5.5.7 (last stable, plus patches - all noted in bugzilla). > I see you have done a lot of research, which identifies the problem very > clearly. However, I don't really like the proposed solution. How about simply > setting the manager field to null since this is the cause of the problem ? > Calling recycle on all sessions in StandardManager.stop could do this easily. Maybe BOTH approaches are required? I don't have my test setup from back when I was looking at this now, but doesn't your proposed solution still leave a reference to the (now recycled) session object in the reverse map in SingleSignOn? I could be missing something - its been a while since we looked at this particular problem. [off topic 1] At the time I was evaluating a commercial memory profiling tool - do you have any recommendations? [off topic 2] Any chance of eclipse .project and .classpath files shipping in the source distribution? Thanks, Kev
To be honest, I am not actually interested in fixing your particular problem. I was thinking first about the generic problem where references to sessions are kept for some reason by any component. So the feedback I was most interested in is: does this fix the reference leak to the classloader ? The design flaw is then in the SSO reverse map if you want to avoid leaking skeleton objects. One easy solution might be to change it to a WeakHashMap. I do not have a good memory profiler to recommend (I don't like any at the moment).
You might want to give the memory profiler in NetBeans 4.x a try. OptimizeIt has some advantages over this, but NetBeans is free and OptimizeIt... isn't.
Created attachment 14593 [details] Always recycle sessions; add activate/passivate events and use passivate for SSO
New cumulative patch now addressing the SSO problem. Please test it (and fix it if there are problems).
I think this has been fixed. Too bad there's no feedback.
(In reply to comment #15) > I think this has been fixed. Too bad there's no feedback. I'm looking to verify this this week. Looks good so far but didn't want to say yes until I'm certain. K.
I had a another look into this, and seems better, but I'm not completely convinced that it is entirely fixed. I can't pin down exactly what is happening, but looks like the very first deployment does not fully undeploy (release all references), but subsequent cycles appear to work correctly. The reason it is odd is because after the first cycle my profiler does not list my "BigServlet" class, but the 20MB it allocates never appears to be released (but for each later cycle the graph shows their heap allocations are released). So, sorry but I can't really give a definite answer. Yes it is a big improvement, but no I can't say it is entirely fixed. My cycle is: 1) visit some application that does a sign-in for SSO 2) use the manager to list the apps 3) memory profiler snapshot 4) drop a leak.war in the web apps folder (auto deploy) 5) visit leak with a browser 6) undeploy leak using manager web app 7) repeat from 2 several times, and compare snaphots as well as graphing various memory stats and heap, etc. Thanks for all the efforts! Kev
What did you test exactly (CVS version ?) ? Still using log4j for everything ? Do you have any idea on where the reference to the classloader would be kept ?
(In reply to comment #18) > What did you test exactly (CVS version ?) ? Still using log4j for everything ? > Do you have any idea on where the reference to the classloader would be kept ? I updated to CVS head just after you committed the changes (31/3 maybe? can't remember). The content of my test WAR is: >jar -ft leak.war index.jsp META-INF/context.xml META-INF/MANIFEST.MF META-INF/ WEB-INF/classes/ WEB-INF/classes/BigServlet.class WEB-INF/lib/ WEB-INF/lib/commons-logging.1.0.3.jar WEB-INF/lib/log4j.1.2.8.jar WEB-INF/web.xml WEB-INF/ Context.xml is simply: <Context antiResourceLocking="true" antiJARLocking="true" > </Context> BigServlet simply has a static byte[] to allocate 20MB, initialised in the init () method, and also logs a message. Web.xml simply specifies it is loaded on startup, and that everything in the web app required the user to be in a given role. The JSP simply displays the remoteUser and SessionID. [Note: this configuration above is just for my test web app and we try to work around it. For production we include the Log4J jar only in our web app, and then use separate jdk logging for anything that is done through commons- logging.] As for ideas as to what is holding the reference, no I don't have ideas. The profiler I'm currently evaluating isn't helping me there :-( The best I used so far (and was using back when I first reported) was YourKit4.0, but my eval has expired. K.