Bug 29907 - connector without accept thread (was: hanging during SSL negotiation)
Summary: connector without accept thread (was: hanging during SSL negotiation)
Status: RESOLVED INVALID
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Connector:Coyote (show other bugs)
Version: 5.0.27
Hardware: Other other
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL: http://marc.theaimsgroup.com/?l=tomca...
Keywords:
Depends on:
Blocks:
 
Reported: 2004-07-04 08:21 UTC by Ralf Hauser
Modified: 2004-11-16 19:05 UTC (History)
0 users



Attachments
threadDump3Connectors.txt (27.07 KB, text/plain)
2004-10-06 19:38 UTC, Ralf Hauser
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ralf Hauser 2004-07-04 08:21:56 UTC
after running tomcat for a while, all of a sudden browsers keep waiting forever
after connecting. With the links-linemode browser, it appears that the reason
for it surfaced: it hangs during "SSL negotiation"
When stopping tomcat manually, I get
<<<<DEBUG [http-8080-Processor4] (StandardWrapper.java:700) -   Returning non-STM
instance
Jul 3, 2004 3:52:45 PM org.apache.coyote.http11.Http11Protocol pause
INFO: Pausing Coyote HTTP/1.1 on http-8443
Jul 3, 2004 3:52:46 PM org.apache.catalina.core.StandardService stop
INFO: Stopping service Privasphere-Standalone
Jul 3, 2004 3:52:46 PM org.apache.catalina.core.StandardHost getDeployer
INFO: Create Host deployer for direct deployment ( non-jmx )
Jul 3, 2004 3:52:46 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path
DEBUG [main] (StandardManager.java:676) - Stopping
DEBUG [main] (StandardManager.java:500) - Unloading persisted sessions
DEBUG [main] (StandardManager.java:507) - Saving persisted sessions to SESSIONS.ser
DEBUG [main] (StandardManager.java:530) - Unloading 0 sessions
DEBUG [main] (StandardManager.java:574) - Expiring 0 persisted sessions
DEBUG [main] (StandardManager.java:586) - Unloading complete
Jul 3, 2004 3:52:46 PM org.apache.catalina.core.StandardHost getDeployer
INFO: Create Host deployer for direct deployment ( non-jmx )
Jul 3, 2004 3:52:46 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path
DEBUG [main] (StandardManager.java:676) - Stopping
DEBUG [main] (StandardManager.java:500) - Unloading persisted sessions
DEBUG [main] (StandardManager.java:507) - Saving persisted sessions to SESSIONS.ser
DEBUG [main] (StandardManager.java:530) - Unloading 8 sessions
 INFO [main] (ManagerBase.java:898) - Cannot serialize session attribute
logoPrepaid for session EB7C9AF5C1736B2D04765F4DB2BE2E
67
java.io.NotSerializableException: java.io.ByteArrayInputStream
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1054)
        at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1332)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1304)
        at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1247)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1052)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:278)
        at
org.apache.catalina.session.StandardSession.writeObject(StandardSession.java:1411)
        at
org.apache.catalina.session.StandardSession.writeObjectData(StandardSession.java:899)
        at
org.apache.catalina.session.StandardManager.doUnload(StandardManager.java:539)
        at
org.apache.catalina.session.StandardManager.unload(StandardManager.java:485)
        at
org.apache.catalina.session.StandardManager.stop(StandardManager.java:687)
        at org.apache.catalina.core.StandardContext.stop(StandardContext.java:4474)
        at
org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:952)
        at
org.apache.catalina.core.StandardHostDeployer.remove(StandardHostDeployer.java:670)
        at org.apache.catalina.core.StandardHost.remove(StandardHost.java:946)
        at org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java:1009)
        at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:986)
        at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:351)
        at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
        at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1123)
        at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1135)
        at org.apache.catalina.core.StandardEngine.stop(StandardEngine.java:483)
        at org.apache.catalina.core.StandardService.stop(StandardService.java:538)
        at org.apache.catalina.core.StandardServer.stop(StandardServer.java:2332)
        at org.apache.catalina.startup.Catalina.stop(Catalina.java:605)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:580)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:284)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:422)
 INFO [main] (ManagerBase.java:898) - Cannot serialize session attribute
payRecip for session EB7C9AF5C1736B2D04765F4DB2BE2E67
java.io.NotSerializableException: java.io.ByteArrayInputStream
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1054)
        at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1332)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1304)
        at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1247)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1052)
        at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1332)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1304)
        at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1247)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1052)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:278)
        at
org.apache.catalina.session.StandardSession.writeObject(StandardSession.java:1411)
        at
org.apache.catalina.session.StandardSession.writeObjectData(StandardSession.java:899)
        at
org.apache.catalina.session.StandardManager.doUnload(StandardManager.java:539)
        at
org.apache.catalina.session.StandardManager.unload(StandardManager.java:485)
        at
org.apache.catalina.session.StandardManager.stop(StandardManager.java:687)
        at org.apache.catalina.core.StandardContext.stop(StandardContext.java:4474)
        at
org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:952)
        at
org.apache.catalina.core.StandardHostDeployer.remove(StandardHostDeployer.java:670)
        at org.apache.catalina.core.StandardHost.remove(StandardHost.java:946)
        at org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java:1009)
        at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:986)
        at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:351)
        at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
        at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1123)
        at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1135)
        at org.apache.catalina.core.StandardEngine.stop(StandardEngine.java:483)
        at org.apache.catalina.core.StandardService.stop(StandardService.java:538)
        at org.apache.catalina.core.StandardServer.stop(StandardServer.java:2332)
        at org.apache.catalina.startup.Catalina.stop(Catalina.java:605)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:580)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:284)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:422)
DEBUG [main] (StandardManager.java:574) - Expiring 8 persisted sessions
DEBUG [main] (StandardManager.java:586) - Unloading complete
 INFO [main] (StandardWrapper.java:1213) - Waiting for 2 instance(s) to be
deallocated
 INFO [main] (StandardWrapper.java:1213) - Waiting for 2 instance(s) to be
deallocated
 INFO [main] (StandardWrapper.java:1213) - Waiting for 2 instance(s) to be
deallocated
java.lang.OutOfMemoryError
...>> other services with non-https connectors were not affected.

My suspicion is that the insufficient memory was the root cause. If so, it would
be great to throw the OutOfMemoryError immediately.
Comment 1 Ralf Hauser 2004-07-06 09:39:16 UTC
- Mozilla hangs after saying "Connected to hostname.domain.tld..." in the status
bar without a timeout (http://bugzilla.mozilla.org/show_bug.cgi?id=249976)
- MSIE 6.0.2800.1106.xpsp2.030422-1633 says in the status bar "opening page
https://hostname.domain.tld:port/..." - no timeout
- Netscape 4.7: "Connect: host hostname.domain.tld:8443 contacted - waiting for
reply" - no timeout
- redhat 9 ELinks 0.4.2 - Text WWW browser: "SSL negotiation" and no timeout
- redhat 9 Links 0.96 - "SSL negotiation" and no timeout
- Opera 7.50K 'Sende Anfrage an hostname.domain.tld...'

if anybody has a recommendation which browser to use to learn where it breaks
down, this would be highly appreciated!
Comment 2 Ralf Hauser 2004-07-06 09:41:08 UTC
Links 0.96: gives a late timeout 
"Error

Receive timeout

[cancel]"
Comment 3 Remy Maucherat 2004-07-06 15:55:50 UTC
I really don't see anything to fix here (if you think otheriwse, please propose
a patch).
You have to remember Tomcat isn't implementing SSL (JSSE does that).
Comment 4 Ralf Hauser 2004-09-11 07:13:58 UTC
The following is one hypothesis for this, but we have seen it also happen on
non-ssl connectors (i.e. regular http) - anyway, here the reasoning goes as per
http://forum.java.sun.com/wireless/thread.jsp?forum=2&thread=502002&message=2374415#2374415
===
We had exactly the same problem and it appeared to be a problem with /dev/random.

It seems that if there's not enough entropy, then the usage of /dev/random might
block waiting for entropy.
You propably don't have enough key pressings, mouse usage, net traffic, etc. for
the entropy required to be filled.

You can set the system property "java.security.egd" to "file:/dev/urandom"  
to use /dev/urandom instead.
/dev/urandom is secure enough for SSL connection usage.

Hope this helps you!

- Atso.
===
see also: http://lists.gnupg.org/pipermail/gnupg-devel/2003-February/019736.html
Comment 5 Ralf Hauser 2004-10-06 19:37:53 UTC
more insights on this
- forget the stacktrace in original description, it is probably our watcher
script, that automatically restarted tomcat (and I didn't know about this)

However:
- the problem that a connector hangs happens with all 3 connectors I have
configured in my server.xml (8443 with https, 8080 and 2712 with plain http)
- what is still possible is to send tomcat a kill -QUIT to obtain a full thread
dump and nothing appears to be particularly special since the server is not very
busy.
However, for each connector I see up to minSpareThreads="25" threads and always
one in java.net.PlainSocketImpl.socketAccept() and the rest in
java.lang.Object.wait(). This is the healthy situation.

Once tomcat hangs, on 1 or more of the connectors, all the threads are in the
wait state and none doing socketAccept.

In the logs, there is no indication why the connector should loose the accepting
thread. The only stacktraces I see come from the struts file-download when,
somebody aborts it and other simple errors in the servlet application (typically
leading to 
...
        at org.apache.struts.action.ActionServlet.doPost(ActionServlet.java:525)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
...
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:929)
        at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:160)
...)

Any hints how to remedy this would be highly appreciated!  I'll attach the JVM
thread dump next.

P.S.: Appears to have some similarity to the mailing-list description as per the
above URL.
P.P.S.: there is no out-of-memory situation before like in Bug 31426
Comment 6 Ralf Hauser 2004-10-06 19:38:30 UTC
Created attachment 12969 [details]
threadDump3Connectors.txt
Comment 7 Ralf Hauser 2004-11-03 09:27:37 UTC
It seems that we have found the solution (keep the fingers crossed  ;) ):
Apparently, Redhat 9. was the first Redhat Release that uses the new Posix
 Threads (NPTL), backported by Redhat from 2.6 Kernel to 2.4
Unfortunately, this appears to be buggy (e.g. CommunigatePro also has not been
"cleared" for redhat)

How to fix:
> mv /lib/tls /lib/tls.unused
> ldconfig
> catalina.sh restart

--> this brings back the old, classic Linux-Thread Implementation - slower, but
hopefully more robust.

When doing less /proc/PID/maps
no more shared objects from /lib/tls should be visible

The affected system as per "uname -a":
Linux myhost 2.4.20-30.9 #1 Wed Feb 4 20:44:26 EST 2004 i686 i686 i386 GNU/Linux

Wild enhancement suggestion to prevent similar stability nightmares in the future:
=======================================================================
(admitted, it is not really Tomcat's task to fix operating system bugs on the
application level, but if it is easy why not doing it anyway)

Idea how this could work:
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:616)

                       synchronized (this) {
                            while (!shouldRun && !shouldTerminate) {
                                this.wait();
                            }
                            _shouldRun = shouldRun;
                            _shouldTerminate = shouldTerminate;
                            _toRun = toRun;
                        }

Why could this not only wait to be notified about "work", but also perhaps wake
up every minute once by itself (and if another thread is already working going
immediately back to sleep since it cannot get the synchronized object.)
If that was possible, a big warning should be printed (provided one can detect
whether the thread woke up by itself instead of being notified) that tomcat is
covering up for underlying OS problems and the Sys-Admin better get informed
about possible patch...

What do you think?
Comment 8 Ralf Hauser 2004-11-07 09:48:46 UTC
the same appears to have happened also with
http://issues.apache.org/jira/browse/JAMES-324