After upgrading to Java 7, I've noticed some applications fail after about 50 days f (for instance May 8 till June 27). This is suspiciously close to 2 ^ 32 -1 milliseconds: Jun 27, 2014 9:24:47 AM org.apache.catalina.core.StandardServer await SEVERE: StandardServer.await: accept: java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:530) at java.net.ServerSocket.accept(ServerSocket.java:498) at org.apache.catalina.core.StandardServer.await(StandardServer.java:431) at org.apache.catalina.startup.Catalina.await(Catalina.java:676) at org.apache.catalina.startup.Catalina.start(Catalina.java:628) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Jun 27, 2014 9:24:47 AM org.apache.coyote.http11.Http11AprProtocol pause INFO: Pausing Coyote HTTP/1.1 on http-51080 Jun 27, 2014 9:24:47 AM org.apache.coyote.ajp.AjpAprProtocol pause INFO: Pausing Coyote AJP/1.3 on ajp-51009 Jun 27, 2014 9:24:48 AM org.apache.catalina.core.StandardService stop INFO: Stopping service Catalina After this, the application is no longer listening. I suspect this may be due to a change in behavior in Java. See: $ $JAVA_HOME/bin/java -version java version "1.7.0_60" Java(TM) SE Runtime Environment (build 1.7.0_60-b19) Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) $ strace -f -o /tmp/hh groovy -e 'new ServerSocket(60001, 1, InetAddress.getByName("localhost")).accept()' File contains: 2212 bind(47, {sa_family=AF_INET6, sin6_port=htons(60001), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 2212 listen(47, 1) = 0 2212 poll([{fd=47, events=POLLIN|POLLERR}], 1, 4294967295 <unfinished ...> $ $JAVA_HOME/bin/java -version java version "1.6.0_45" Java(TM) SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode) 2661 bind(46, {sa_family=AF_INET6, sin6_port=htons(60001), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 2661 listen(46, 1) = 0 2661 accept(46, <unfinished ...> While java6 calls accept(), java7 first waits for an event on the socket with poll(). However, it uses a timeout of 4294967295 milliseconds.
Hmm. My initial impression is that that is a JRE bug. I've contacted someone at Oracle (see the dev list) who has been helpful in pointing us in the right direction to report bugs. Keep an eye on the dev list for progress on that. However, I can't repeat the results you report. When I run strace on the same Java version as you are using, I see a call to poll with a timeout of -1 which I am assuming is an infinite timeout. You'll need to provide enough information about your environment to enable us (and Oracle) to repeat this otherwise this is going to get resolved as WORKSFORME.
I tested this on Ubuntu Trusty amd64. I can also reproduce on my home machine, which runas arch (amd64). You can try this Vagrantfile: Vagrant.configure('2') do |config| config.vm.box = "ubuntu/trusty64" provision_script = <<-eos cat > ~vagrant/run-test << eod #!/bin/bash # only one version of Java installed; no need to set JAVA_HOME # so Groovy picks the correct one aptitude install -y oracle-java7-installer groovy java -version strace -f -o /tmp/hh groovy -e 'new ServerSocket(60001, 1, InetAddress.getByName("localhost")).accept()' eod chmod +x ~vagrant/run-test apt-get install -y python-software-properties strace vim add-apt-repository -y ppa:webupd8team/java apt-get update eos config.vm.provision "shell", inline: provision_script end after ssh'ing, run sudo ./run-test, accept the license and then Ctrl+C. You can then instpect /tmp/hh
The strace output may actually be a bug in strace because poll() takes a signed int as timeout; it would seem strace is interpreting the value as an unsigned int. Indeed: $ cat a.c && gcc a.c && strace ./a.out #include <poll.h> #include <stdio.h> void main() { struct pollfd fd = { .fd = fileno(stdin), .events = POLLIN }; int timeout = -1; poll(&fd, 1, timeout); } execve("./a.out", ["./a.out"], [/* 57 vars */]) = 0 brk(0) = 0xeb5000 (...) poll([{fd=0, events=POLLIN}], 1, 4294967295^CProcess 6565 detached That leaves me (short of a bug in glibc or the kernel) with no explanation for the exception I got, which takes quite a while to reproduce, though if poll doesn't use the monotonic clock, there may be an faster way. I'll check tomorrow.
A quick observation: the poll syscall used to take a long, and now takes an int. Maybe strace hasn't been updated yet: http://lwn.net/Articles/483078/
I was testing on Ubuntu Precise. After updating to Trusty I see the same value as you of 4294967295. That, at least, explains why we were seeing different values. I've looked through both the Tomcat code and the Java 7 code and I don't see anything wrong (although I am no C/C++ expert). Neither do I see anything that explain the behaviour you are seeing. I think the next steps are to see if you can reproduce this issue with a test case that doesn't take ~50 days to run.
Javadoc for ServerSocket.accept() says that SocketTimeoutException can be thrown by the method "if a timeout was previously set with setSoTimeout". [1] [1] http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#accept%28%29 BTW, Javadoc for ServerSocket.setSoTimeout() says that to set an infinite timeout one uses the value of "0". Is somebody confusing '0' and '-1'? I think the javadoc for ServerSocket.accept() implies that the default timeout is infinite. So I think it is a JRE bug. In general it makes some sense to protect against this. A timeout is not a "receiving a valid shutdown command" event, and encountering a timeout does not make the server socket an invalid one, so we should be able to continue.
(In reply to Konstantin Kolinko from comment #6) > BTW, > Javadoc for ServerSocket.setSoTimeout() says that to set an infinite timeout > one uses the value of "0". Is somebody confusing '0' and '-1'? Not that I can see having looked at the JRE code. > I think the javadoc for ServerSocket.accept() implies that the default > timeout is infinite. So I think it is a JRE bug. That is a possibility, but having looked at the JRE code, I don't see where the problem is. > In general it makes some sense to protect against this. A timeout is not a > "receiving a valid shutdown command" event, and encountering a timeout does > not make the server socket an invalid one, so we should be able to continue. Agreed. There is no reason I can think of that accept should time out so we can certiainly add some protection against this issue here. We can probably add some useful debug info as well.
Note that the protection and debug info is going to go into Tomcat 8. It would help a lot if you could upgrade at least one system experiencing this error to Tomcat 8.
Workaround added to 8.0.x for 8.0.10 onwards. If the timout occurs a warning is logged that reports how long accept was blocking for before the timeout.
The workaround and simple debug code was pretty simple so I have backported it to 7.0.x (it will be in 7.0.55 onwards) and proposed if for 6.0.x.
Work around added to 6.0.x for 6.0.42 onwards.
is this fix available in tomcat 7.0.50? I am getting this error in jdk1.7 64 bit tomcat 7.0.50
(In reply to fach hoch from comment #12) > is this fix available in tomcat 7.0.50? No, it's in 7.0.55, as stated in comment #10.