Summary: | connector without accept thread (was: hanging during SSL negotiation) | ||
---|---|---|---|
Product: | Tomcat 5 | Reporter: | Ralf Hauser <hauser> |
Component: | Connector:Coyote | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED INVALID | ||
Severity: | enhancement | ||
Priority: | P3 | ||
Version: | 5.0.27 | ||
Target Milestone: | --- | ||
Hardware: | Other | ||
OS: | other | ||
URL: | http://marc.theaimsgroup.com/?l=tomcat-dev&m=108897347220438&w=2 | ||
Attachments: | threadDump3Connectors.txt |
Description
Ralf Hauser
2004-07-04 08:21:56 UTC
- Mozilla hangs after saying "Connected to hostname.domain.tld..." in the status bar without a timeout (http://bugzilla.mozilla.org/show_bug.cgi?id=249976) - MSIE 6.0.2800.1106.xpsp2.030422-1633 says in the status bar "opening page https://hostname.domain.tld:port/..." - no timeout - Netscape 4.7: "Connect: host hostname.domain.tld:8443 contacted - waiting for reply" - no timeout - redhat 9 ELinks 0.4.2 - Text WWW browser: "SSL negotiation" and no timeout - redhat 9 Links 0.96 - "SSL negotiation" and no timeout - Opera 7.50K 'Sende Anfrage an hostname.domain.tld...' if anybody has a recommendation which browser to use to learn where it breaks down, this would be highly appreciated! Links 0.96: gives a late timeout "Error Receive timeout [cancel]" I really don't see anything to fix here (if you think otheriwse, please propose a patch). You have to remember Tomcat isn't implementing SSL (JSSE does that). The following is one hypothesis for this, but we have seen it also happen on non-ssl connectors (i.e. regular http) - anyway, here the reasoning goes as per http://forum.java.sun.com/wireless/thread.jsp?forum=2&thread=502002&message=2374415#2374415 === We had exactly the same problem and it appeared to be a problem with /dev/random. It seems that if there's not enough entropy, then the usage of /dev/random might block waiting for entropy. You propably don't have enough key pressings, mouse usage, net traffic, etc. for the entropy required to be filled. You can set the system property "java.security.egd" to "file:/dev/urandom" to use /dev/urandom instead. /dev/urandom is secure enough for SSL connection usage. Hope this helps you! - Atso. === see also: http://lists.gnupg.org/pipermail/gnupg-devel/2003-February/019736.html more insights on this - forget the stacktrace in original description, it is probably our watcher script, that automatically restarted tomcat (and I didn't know about this) However: - the problem that a connector hangs happens with all 3 connectors I have configured in my server.xml (8443 with https, 8080 and 2712 with plain http) - what is still possible is to send tomcat a kill -QUIT to obtain a full thread dump and nothing appears to be particularly special since the server is not very busy. However, for each connector I see up to minSpareThreads="25" threads and always one in java.net.PlainSocketImpl.socketAccept() and the rest in java.lang.Object.wait(). This is the healthy situation. Once tomcat hangs, on 1 or more of the connectors, all the threads are in the wait state and none doing socketAccept. In the logs, there is no indication why the connector should loose the accepting thread. The only stacktraces I see come from the struts file-download when, somebody aborts it and other simple errors in the servlet application (typically leading to ... at org.apache.struts.action.ActionServlet.doPost(ActionServlet.java:525) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at javax.servlet.http.HttpServlet.service(HttpServlet.java:810) ... at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:929) at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:160) ...) Any hints how to remedy this would be highly appreciated! I'll attach the JVM thread dump next. P.S.: Appears to have some similarity to the mailing-list description as per the above URL. P.P.S.: there is no out-of-memory situation before like in Bug 31426 Created attachment 12969 [details]
threadDump3Connectors.txt
It seems that we have found the solution (keep the fingers crossed ;) ):
Apparently, Redhat 9. was the first Redhat Release that uses the new Posix
Threads (NPTL), backported by Redhat from 2.6 Kernel to 2.4
Unfortunately, this appears to be buggy (e.g. CommunigatePro also has not been
"cleared" for redhat)
How to fix:
> mv /lib/tls /lib/tls.unused
> ldconfig
> catalina.sh restart
--> this brings back the old, classic Linux-Thread Implementation - slower, but
hopefully more robust.
When doing less /proc/PID/maps
no more shared objects from /lib/tls should be visible
The affected system as per "uname -a":
Linux myhost 2.4.20-30.9 #1 Wed Feb 4 20:44:26 EST 2004 i686 i686 i386 GNU/Linux
Wild enhancement suggestion to prevent similar stability nightmares in the future:
=======================================================================
(admitted, it is not really Tomcat's task to fix operating system bugs on the
application level, but if it is easy why not doing it anyway)
Idea how this could work:
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:616)
synchronized (this) {
while (!shouldRun && !shouldTerminate) {
this.wait();
}
_shouldRun = shouldRun;
_shouldTerminate = shouldTerminate;
_toRun = toRun;
}
Why could this not only wait to be notified about "work", but also perhaps wake
up every minute once by itself (and if another thread is already working going
immediately back to sleep since it cannot get the synchronized object.)
If that was possible, a big warning should be printed (provided one can detect
whether the thread woke up by itself instead of being notified) that tomcat is
covering up for underlying OS problems and the Sys-Admin better get informed
about possible patch...
What do you think?
the same appears to have happened also with http://issues.apache.org/jira/browse/JAMES-324 |