In org.apache.catalina.connector.HttpConnector, line 1028, the connector calls setTcpNoDelay on the newly created socket. This method can throw a SocketException, which is not caught. The exception causes the HttpConnector thread to finish, removing the Socket object without explicitly closing the underlying network socket. Tomcat then attempts to recreate the HttpConnector, which attempts to open a new socket with the same address binding. This will fail, and the application will then cease to respond on the given HTTP port. This is happening very intermittently on a production machine we have. This is very difficult to replicate, but I hope that we have given enough details to fix the underlying problem, which is the uncaught SocketException.
We are getting this on a Solaris box. Occasionally occurs. I have filed a bug with Sun, because I can't see why this should be failing. I searched the web and found two previous cases of this, one Jetty and one Tomcat, but no solution was found for either. Both on solaris boxes.
I think that's good, but it doesn't really change the fact that the exception isn't being checked for at all.
You should move to using the CoyoteConnector. Its a lot better in general. Luckily, it calls setTcpNoDelay in the WorkerThread, rather than in the ServerSocket thread. So your problem doesn't occur. Although as far as I can tell there is less, although likely still some threads serving the request. java.net.SocketException: Invalid argument at java.net.PlainSocketImpl.socketSetOption(Native Method) at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:240) at java.net.Socket.setTcpNoDelay(Socket.java:745) at org.apache.tomcat.util.net.PoolTcpEndpoint.setSocketOptions(PoolTcpEndpoint.java:468) at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:564) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:619) at java.lang.Thread.run(Thread.java:536) From some of the open bugs, I think the Catalina HttpConnector is basically unmaintained.
At Around 100 user load, (sometimes even lower) Tomcat stops responding, when viewed in console this exception is thrown. ---------------------------------------------------- bash-2.03# Sep 21 03:16:04 raffles sendmail[7390]: [ID 702911 mail.alert] unable to qualify my own domain name (raffles) -- using short name java.net.SocketException: Invalid argument at java.net.PlainSocketImpl.socketSetOption(Native Method) at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:240) at java.net.Socket.setSoLinger(Socket.java:814) at org.apache.tomcat.util.net.PoolTcpEndpoint.setSocketOptions(PoolTcpEndpoint.java:434) at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:533) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:530) at java.lang.Thread.run(Thread.java:534) ---------------------------------------------------- Note: We recently ported our product from Jigsaw2.2.1 to Tomcat. As this issue has resulted in our RC build which is now going through Performance testing, product's release is on a hold till issue is resolved.
I known, I know. Mentioning your OS and VM is irrelevant, and takes time. Anyway, this error is caught (what you see is the result of a printStackTrace; I will improve the logging of this for Tomcat 5.0 by not catching an exception in setSocketOptions). The problem is that if this error occurs, I think the server socket is dead (and the VM's network stack may be too as well). So try a different VM (for ex, IBM if you're running under Linux), or apply all OS kernel patches and use the latest VM (Solaris). I don't see how this report can be caused by a Tomcat bug, rather than a VM bug. If all of the submitters use Solaris, I think it's time to head to java.sun.com and file a bug there (assuming you use an up to date system).
Yep, totally agree with you. It's got to be an OS or at least JVM bug. I'm guessing everyone is on Sun boxes. I've previously filed a bug with java.sun.com. But won't find anything out for a couple of weeks more. Its in java.sun limbo state at the moment. We are also using virtual network endpoints on a single box, its a high end production box. I wonder if others are too? I'm not sure what effect your change has, this error was the only one we were getting, I assumed that socket was thrown away, but it was probably still working (after the exception), since the exception was previously being caught. Now that you are letting the exception go higher, will those threads keep processing requests?
Previously, the socket was not thrown away right away, but passed along for normal processing. It's sort of unpredictable what happened after that, so I think I will port the change to Tomcat 4.1.x.
We have encountered this bug on a Tru64 platform, so it is probably not an OS related problem. We are using Tomcat version 4.1.24.
The stackTrace given by 'Yuri Schimke' on 09-12-2003 is different from the one given by 'J S Dhillon' on 09-23-2003. I here at my end have faced both of these, but the symptoms are nearly the same. So It seems that these two stackTraces correspond to be two different bugs in Tomcat 4.1.xx. Later today I Will file one different bug for the second stackTrace. These can be easily produced at our end here. Second one is more prominent than the first one.
Answer from Sun in relation to setTcpNoDelay failing is: "This error is caused bya EINVAL return code from the setsockopt() system call.The most likely cause of this (considering that it happensonly sometimes) is that the socket has been shutdown or closedin some way, but Java (or Apache) thinks it is still open.Which version of Tomcat are you running, and how easy is it toreproduce this?" As annoying as this is, this sounds reasonable. I can't say this isn't happening. So something could be opening sockets and closing them immediately. In which case the currently coyote connector handling, (especially >= 4.1.29) looks like the correct behaviour. log the failed connection and ignore it. Could this be a port scanner or health check? I can't think of what would be doing that in our system. BTW I guess the catalina connector is still broken, because it doesn't catch the exception. But at least coyote connector handles it gracefully, and it isn't an indicator of bad health of the system.
This bug report has been transferred to Tomcat 5 because TC4 and TC5 share the connectors and TC5 is now the focus of development effort
There's really no bug here.