Bug 23146 - Calling socket.setTcpNoDelay causes connector to disconnect
Summary: Calling socket.setTcpNoDelay causes connector to disconnect
Status: RESOLVED INVALID
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Connector:Coyote (show other bugs)
Version: Nightly Build
Hardware: Sun Solaris
: P1 blocker with 1 vote (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-09-12 19:32 UTC by John Cater
Modified: 2004-11-16 19:05 UTC (History)
3 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Cater 2003-09-12 19:32:57 UTC
In org.apache.catalina.connector.HttpConnector, line 1028, the connector calls 
setTcpNoDelay on the newly created socket.  This method can throw a 
SocketException, which is not caught.  The exception causes the HttpConnector 
thread to finish, removing the Socket object without explicitly closing the 
underlying network socket.  Tomcat then attempts to recreate the HttpConnector, 
which attempts to open a new socket with the same address binding.  This will 
fail, and the application will then cease to respond on the given HTTP port.

This is happening very intermittently on a production machine we have.  This is 
very difficult to replicate, but I hope that we have given enough details to 
fix the underlying problem, which is the uncaught SocketException.
Comment 1 Yuri Schimke 2003-09-12 19:38:33 UTC
We are getting this on a Solaris box.  Occasionally occurs.

I have filed a bug with Sun, because I can't see why this should be failing.   
I searched the web and found two previous cases of this, one Jetty and one
Tomcat, but no solution was found for either.  Both on solaris boxes.
Comment 2 John Cater 2003-09-12 19:40:19 UTC
I think that's good, but it doesn't really change the fact that the exception 
isn't being checked for at all.
Comment 3 Yuri Schimke 2003-09-12 19:49:07 UTC
You should move to using the CoyoteConnector.  Its a lot better in general. 
Luckily, it calls setTcpNoDelay in the WorkerThread, rather than in the
ServerSocket thread.  So your problem doesn't occur.  Although as far as I can
tell there is less, although likely still some threads serving the request.  

java.net.SocketException: Invalid argument
        at java.net.PlainSocketImpl.socketSetOption(Native Method)
        at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:240)
        at java.net.Socket.setTcpNoDelay(Socket.java:745)
        at 
org.apache.tomcat.util.net.PoolTcpEndpoint.setSocketOptions(PoolTcpEndpoint.java:468)
        at 
org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:564)
        at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:619)
        at java.lang.Thread.run(Thread.java:536)

From some of the open bugs, I think the Catalina HttpConnector is basically
unmaintained.
Comment 4 J S Dhillon 2003-09-23 03:57:12 UTC
At Around 100 user load, (sometimes even lower) Tomcat stops responding, when
viewed in console this exception is thrown.
----------------------------------------------------
bash-2.03# Sep 21 03:16:04 raffles sendmail[7390]: [ID 702911 mail.alert] unable
to qualify my own domain name (raffles) -- using short name
java.net.SocketException: Invalid argument
        at java.net.PlainSocketImpl.socketSetOption(Native Method)
        at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:240)
        at java.net.Socket.setSoLinger(Socket.java:814)
        at
org.apache.tomcat.util.net.PoolTcpEndpoint.setSocketOptions(PoolTcpEndpoint.java:434)
        at
org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:533)
        at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:530)
        at java.lang.Thread.run(Thread.java:534)
----------------------------------------------------

Note: We recently ported our product from Jigsaw2.2.1 to Tomcat. As this issue
has resulted in our RC build which is now going through Performance testing,
product's release is on a hold till issue is resolved.
Comment 5 Remy Maucherat 2003-09-23 06:37:02 UTC
I known, I know. Mentioning your OS and VM is irrelevant, and takes time.
Anyway, this error is caught (what you see is the result of a printStackTrace; I
will improve the logging of this for Tomcat 5.0 by not catching an exception in
setSocketOptions). The problem is that if this error occurs, I think the server
socket is dead (and the VM's network stack may be too as well). So try a
different VM (for ex, IBM if you're running under Linux), or apply all OS kernel
patches and use the latest VM (Solaris).
I don't see how this report can be caused by a Tomcat bug, rather than a VM bug.
If all of the submitters use Solaris, I think it's time to head to java.sun.com
and file a bug there (assuming you use an up to date system).
Comment 6 Yuri Schimke 2003-09-23 06:54:26 UTC
Yep, totally agree with you.  It's got to be an OS or at least JVM bug.  I'm
guessing everyone is on Sun boxes.

I've previously filed a bug with java.sun.com.  But won't find anything out for
a couple of weeks more.  Its in java.sun limbo state at the moment.

We are also using virtual network endpoints on a single box, its a high end
production box.  I wonder if others are too?

I'm not sure what effect your change has, this error was the only one we were
getting, I assumed that socket was thrown away, but it was probably still
working (after the exception), since the exception was previously being caught.
 Now that you are letting the exception go higher, will those threads keep
processing requests?   
Comment 7 Remy Maucherat 2003-09-23 07:06:22 UTC
Previously, the socket was not thrown away right away, but passed along for
normal processing. It's sort of unpredictable what happened after that, so I
think I will port the change to Tomcat 4.1.x.
Comment 8 Peter Dreessen 2003-10-02 13:54:22 UTC
We have encountered this bug on a Tru64 platform, so it is probably not an OS 
related problem. We are using Tomcat version 4.1.24.
Comment 9 J S Dhillon 2003-10-06 04:05:19 UTC
The stackTrace given by 'Yuri Schimke' on 09-12-2003 is different from the one 
given by 'J S Dhillon' on 09-23-2003. I here at my end have faced both of 
these, but the symptoms are nearly the same. So It seems that these two 
stackTraces correspond to be two different bugs in Tomcat 4.1.xx. Later today 
I Will file one different bug for the second stackTrace.

These can be easily produced at our end here. Second one is more prominent 
than the first one.
Comment 10 Yuri Schimke 2003-11-25 12:57:56 UTC
Answer from Sun in relation to setTcpNoDelay failing is:

"This error is caused bya EINVAL return code from the setsockopt() system
call.The most likely cause of this (considering that it happensonly sometimes)
is that the socket has been shutdown or closedin some way, but Java (or Apache)
thinks it is still open.Which version of Tomcat are you running, and how easy is
it toreproduce this?"

As annoying as this is, this sounds reasonable.  I can't say this isn't
happening.  So something could be opening sockets and closing them immediately.  

In which case the currently coyote connector handling, (especially >= 4.1.29)
looks like the correct behaviour.  log the failed connection and ignore it. 

Could this be a port scanner or health check?  I can't think of what would be
doing that in our system.

BTW I guess the catalina connector is still broken, because it doesn't catch the
exception.  But at least coyote connector handles it gracefully, and it isn't an
indicator of bad health of the system.
Comment 11 Mark Thomas 2004-06-19 12:23:56 UTC
This bug report has been transferred to Tomcat 5 because TC4 and TC5 share the 
connectors and TC5 is now the focus of development effort
Comment 12 Remy Maucherat 2004-06-19 12:27:20 UTC
There's really no bug here.