org.apache.catalina.core.StandardContext.stop() method changes the current thead's context classloader but does not always reset it if an exception is thrown. It resets the classloader in the finally clause of a try statement, but the try statement begins too late. If an exception is thrown before the try statament is reached, the contextClassLoader is not reset to its original value. Here's the code fragment: // Binding thread ClassLoader oldCCL = bindThread(); // Stop our filters filterStop(); // Stop our application listeners listenerStop(); // Stop ContainerBackgroundProcessor thread super.threadStop(); if ((manager != null) && (manager instanceof Lifecycle)) { ((Lifecycle) manager).stop(); } // Finalize our character set mapper setCharsetMapper(null); // Normal container shutdown processing if (log.isDebugEnabled()) log.debug("Processing standard container shutdown"); // Notify our interested LifecycleListeners lifecycle.fireLifecycleEvent(STOP_EVENT, null); started = false; try { ... } finally { // Unbinding thread unbindThread(oldCCL); } An exception is sometimes thrown in this statement (which is before the try statement): if ((manager != null) && (manager instanceof Lifecycle)) { ((Lifecycle) manager).stop(); } due to bug 30489 (java.lang.IllegalStateException: removeAttribute: Session already invalidated) Leaving the substituted classLoader severely disrupts the JBoss deployment scanner causing it to throw an exception and fail in each loop. Restarting JBoss is required to recover from this problem.
I read with interest your report and wonder if some other infrequenct problems that I've experienced in the past maybe related to this section of code. Would it make sense to use code like: try { filterStop(); } catch(Exception e) { e.printSomething(); } try { listenerStop(); } catch(Exception e) { e.printSomething(); } etc... the idea being that each invokation to web-application code (from TC internal code) should not be able to upset TCs shutdown mechanism and should try a best effort to do everything properly. Then at the end of it all just cut away the web-app and cleanup. My suggestion above is making a huge presumption that filterStop() invokes all the active javax.servlet.Filter#destroy() filters under the context to shut them down. Looking briefly through that particular path I can't see any trapping of web-app exceptions. StandardContext#filterStop() ApplicationFilterConfig#release() javax.servlet.Filter#destroy() In the absense of any mandated behaviour (from JSRs) I would think the exceptions are best caught in ApplicationFilterConfig#release() so that other filters within the same context get called (call it "best effort"). I dont understand the org.apache.catalina.SecurityUtil stuff in the same place, maybe there is other ways to invoke javax.servlet.Filter#destroy() that need the same treatment. Its very important that TCs internal accounting state can never be disrupted no matter how badly a web-app behaves. An audit of this code to wrap the whole section with a suitable finally clause to ensure TCs internal state retains its integrity makes perfect sense to me. The two things together can only increase robustness with minimal cost. Possibily related bug reports discussing web-app generated Exceptions during context shutdown: Bug #37498 Bug #36250 => Comment #14 (http://issues.apache.org/bugzilla/show_bug.cgi?id=36250#c14)
Created attachment 17909 [details] Defensive coding patch for all bind/unbind usage NEEDS PEER REVIEW This is a patch base on a code walk (rather than a full understanding of the code) and some re-work to eliminate potential problem that seem possible from the perspective of that code walk. Namely protection from Exception generation at the wrong moment between bind/unbind calls. There are two minor changes in behaviour with this patch, which I think are more correct than the original (but then I didn't take the time the fully understand the code so please excuse me). #start() around line 4063 if ok==false we don't call bindThread() now. Since the only way we get to call unbind is in the finally block within the if(ok) condition. The means the code around line 4155 to 4166 may or may not be in bound state, depending upon ok==true|false, was this originally intended ? #start() around line if unbindThread(oldCC) were to throw an exception, before we would call unbindThread(oldCC) again, twice in a row, maybe this is safe anyway but now we only call it the once. #stop() as the bug report suggests wrapping all the calls within the try block make defensive sense to me. Other than that for the normal code path execution should be the same as before, but nows its very easy to see from the code that flipping between bind and unbind is better protected from exceptions.
(In reply to comment #2) > Created an attachment (id=17909) [edit] > Defensive coding patch for all bind/unbind usage NEEDS PEER REVIEW It doesn't look like a good patch. > Its very important that TCs internal accounting state can never > be disrupted no matter how badly a web-app behaves It's very funny. Let's see: - ways to guarantee thread release: none - ways to guarantee memory release: none - ways to guarantee classloading is clean (ex: log4j): none Please write a letter to Sun for me :)
I have improved the try/finally, especially in stop.
Thats really great, thanks very much.
Created attachment 17923 [details] Defensive coding patch for 5.0.xx This is based on Remy's SVN commit, I think it would benifit 5.0.x users.
Created attachment 17924 [details] Defensive coding patch for 4.1.xx I'm sorry there are too many hoops in the BUILDING.txt for me to even test compile this for 4.1.xx. Again shoud be based on Remy's SVN commit.
Reopened for additional patch review.
Changing version to 5.0.30, as this was taken care of by Remy for the 5.5 branch, so 5.0.30 is the latest relevant version. Also downgrading priority from critical to normal.
Done for 4.1.x
This will nto be fixed in 5.0.x since that branch is no longer supported.