Bug 56684 - java7: java.net.SocketTimeoutException: Accept timed out
Summary: java7: java.net.SocketTimeoutException: Accept timed out
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 6
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 6.0.39
Hardware: PC Linux
: P2 major (vote)
Target Milestone: default
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-30 12:21 UTC by Gustavo Lopes
Modified: 2014-08-05 03:27 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gustavo Lopes 2014-06-30 12:21:22 UTC
After upgrading to Java 7, I've noticed some applications fail after about 50 days f (for instance May 8 till June 27). This is suspiciously close to 2 ^ 32 -1 milliseconds:

Jun 27, 2014 9:24:47 AM org.apache.catalina.core.StandardServer await
SEVERE: StandardServer.await: accept:
java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        at java.net.ServerSocket.implAccept(ServerSocket.java:530)
        at java.net.ServerSocket.accept(ServerSocket.java:498)
        at org.apache.catalina.core.StandardServer.await(StandardServer.java:431)
        at org.apache.catalina.startup.Catalina.await(Catalina.java:676)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:628)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Jun 27, 2014 9:24:47 AM org.apache.coyote.http11.Http11AprProtocol pause
INFO: Pausing Coyote HTTP/1.1 on http-51080
Jun 27, 2014 9:24:47 AM org.apache.coyote.ajp.AjpAprProtocol pause
INFO: Pausing Coyote AJP/1.3 on ajp-51009
Jun 27, 2014 9:24:48 AM org.apache.catalina.core.StandardService stop
INFO: Stopping service Catalina

After this, the application is no longer listening.

I suspect this may be due to a change in behavior in Java. See:

$ $JAVA_HOME/bin/java -version
java version "1.7.0_60"
Java(TM) SE Runtime Environment (build 1.7.0_60-b19)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)

$ strace -f -o /tmp/hh groovy -e 'new ServerSocket(60001, 1, InetAddress.getByName("localhost")).accept()'
File contains:
2212  bind(47, {sa_family=AF_INET6, sin6_port=htons(60001), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
2212  listen(47, 1)                     = 0
2212  poll([{fd=47, events=POLLIN|POLLERR}], 1, 4294967295 <unfinished ...>

$ $JAVA_HOME/bin/java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

2661  bind(46, {sa_family=AF_INET6, sin6_port=htons(60001), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
2661  listen(46, 1)                     = 0
2661  accept(46,  <unfinished ...>

While java6 calls accept(), java7 first waits for an event on the socket with poll(). However, it uses a timeout of 4294967295 milliseconds.
Comment 1 Mark Thomas 2014-07-07 21:36:26 UTC
Hmm. My initial impression is that that is a JRE bug. I've contacted someone at Oracle (see the dev list) who has been helpful in pointing us in the right direction to report bugs. Keep an eye on the dev list for progress on that.

However, I can't repeat the results you report. When I run strace on the same Java version as you are using, I see a call to poll with a timeout of -1 which I am assuming is an infinite timeout.

You'll need to provide enough information about your environment to enable us (and Oracle) to repeat this otherwise this is going to get resolved as WORKSFORME.
Comment 2 Gustavo Lopes 2014-07-07 22:44:30 UTC
I tested this on Ubuntu Trusty amd64. I can also reproduce on my home machine, which runas arch (amd64).

You can try this Vagrantfile:

Vagrant.configure('2') do |config|
  config.vm.box = "ubuntu/trusty64"

  provision_script = <<-eos
      cat > ~vagrant/run-test << eod
#!/bin/bash

# only one version of Java installed; no need to set JAVA_HOME
# so Groovy picks the correct one
aptitude install -y oracle-java7-installer groovy
java -version
strace -f -o /tmp/hh groovy -e 'new ServerSocket(60001, 1, InetAddress.getByName("localhost")).accept()'
eod
      chmod +x ~vagrant/run-test

      apt-get install -y python-software-properties strace vim
      add-apt-repository -y ppa:webupd8team/java
      apt-get update
    eos

  config.vm.provision "shell", inline: provision_script
end

after ssh'ing, run sudo ./run-test, accept the license and then Ctrl+C. You can then instpect /tmp/hh
Comment 3 Gustavo Lopes 2014-07-07 23:37:22 UTC
The strace output may actually be a bug in strace because poll() takes a signed int as timeout; it would seem strace is interpreting the value as an unsigned int. Indeed:

$ cat a.c && gcc a.c && strace ./a.out 
#include <poll.h>
#include <stdio.h>

void main() {
	struct pollfd fd = { .fd = fileno(stdin), .events = POLLIN };
	int timeout = -1;

	poll(&fd, 1, timeout);
}
execve("./a.out", ["./a.out"], [/* 57 vars */]) = 0
brk(0)                                  = 0xeb5000
(...)
poll([{fd=0, events=POLLIN}], 1, 4294967295^CProcess 6565 detached

That leaves me (short of a bug in glibc or the kernel) with no explanation for the exception I got, which takes quite a while to reproduce, though if poll doesn't use the monotonic clock, there may be an faster way. I'll check tomorrow.
Comment 4 Gustavo Lopes 2014-07-07 23:53:04 UTC
A quick observation: the poll syscall used to take a long, and now takes an int. Maybe strace hasn't been updated yet:

http://lwn.net/Articles/483078/
Comment 5 Mark Thomas 2014-07-08 11:59:21 UTC
I was testing on Ubuntu Precise. After updating to Trusty I see the same value as you of 4294967295. That, at least, explains why we were seeing different values.

I've looked through both the Tomcat code and the Java 7 code and I don't see anything wrong (although I am no C/C++ expert). Neither do I see anything that explain the behaviour you are seeing.

I think the next steps are to see if you can reproduce this issue with a test case that doesn't take ~50 days to run.
Comment 6 Konstantin Kolinko 2014-07-08 12:08:51 UTC
Javadoc for ServerSocket.accept() says that SocketTimeoutException can be thrown by the method "if a timeout was previously set with setSoTimeout". [1]


[1] http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#accept%28%29

BTW,
Javadoc for ServerSocket.setSoTimeout() says that to set an infinite timeout one uses the value of "0". Is somebody confusing '0' and '-1'?

I think the javadoc for ServerSocket.accept() implies that the default timeout is infinite. So I think it is a JRE bug.


In general it makes some sense to protect against this. A timeout is not a "receiving a valid shutdown command" event, and encountering a timeout does not make the server socket an invalid one, so we should be able to continue.
Comment 7 Mark Thomas 2014-07-08 12:37:43 UTC
(In reply to Konstantin Kolinko from comment #6)
> BTW,
> Javadoc for ServerSocket.setSoTimeout() says that to set an infinite timeout
> one uses the value of "0". Is somebody confusing '0' and '-1'?

Not that I can see having looked at the JRE code.

> I think the javadoc for ServerSocket.accept() implies that the default
> timeout is infinite. So I think it is a JRE bug.

That is a possibility, but having looked at the JRE code, I don't see where the problem is.

> In general it makes some sense to protect against this. A timeout is not a
> "receiving a valid shutdown command" event, and encountering a timeout does
> not make the server socket an invalid one, so we should be able to continue.

Agreed. There is no reason I can think of that accept should time out so we can certiainly add some protection against this issue here. We can probably add some useful debug info as well.
Comment 8 Mark Thomas 2014-07-08 20:32:22 UTC
Note that the protection and debug info is going to go into Tomcat 8. It would help a lot if you could upgrade at least one system experiencing this error to Tomcat 8.
Comment 9 Mark Thomas 2014-07-08 22:09:11 UTC
Workaround added to 8.0.x for 8.0.10 onwards. If the timout occurs a warning is logged that reports how long accept was blocking for before the timeout.
Comment 10 Mark Thomas 2014-07-09 10:35:47 UTC
The workaround and simple debug code was pretty simple so I have backported it to 7.0.x (it will be in 7.0.55 onwards) and proposed if for 6.0.x.
Comment 11 Mark Thomas 2014-07-29 09:32:11 UTC
Work around added to 6.0.x for 6.0.42 onwards.
Comment 12 fach hoch 2014-08-05 03:20:21 UTC
is this fix available in tomcat 7.0.50?  I am getting this error in jdk1.7 64 bit tomcat 7.0.50
Comment 13 Chuck Caldarale 2014-08-05 03:27:38 UTC
(In reply to fach hoch from comment #12)
> is this fix available in tomcat 7.0.50?

No, it's in 7.0.55, as stated in comment #10.