the TCP_DEFER_ACCEPT code for linux sets the timeout to 1 second. this is totally broken... ideally the value should be configurable via AcceptFilter or otherwise, but at a minimum it should be something like 30 seconds. it's only by sheer luck that the current code doesn't cause havoc -- because the kernel itself isn't doing the right thing with the timeout and is waiting a lot longer than specified. -dean
Created attachment 19332 [details] set TCP_DEFER_ACCEPT to 30 seconds
see also http://marc.theaimsgroup.com/?l=linux-netdev&m=116753348815044&w=2
(In reply to comment #0) > the TCP_DEFER_ACCEPT code for linux sets the timeout to 1 second. this is > totally broken... ideally the value should be configurable via AcceptFilter or > otherwise, but at a minimum it should be something like 30 seconds. I agree that it makes sense to keep this value configurable. But what do you think is actually "totally broken" with the 1 second setting (provided the kernel would do the right thing)? Do you think too much clients, which have a slow / bad connection to the httpd server will not get a connection because it is dropped by the kernel before they can sent data? Question to the BSD guys: Is it possible to set a timeout for the BSD accept filters accf_data and accf_http how long they should wait for a request until they drop the socket?
http://lkml.org/lkml/2000/10/21/80 provides also an interesting discussion on the Linux kernel behaviour.
if there's any amount of packet loss (or geosynchronous orbit satellites involved), the 3-way handshake will never complete if the server gives up after 1 second... i'm sure if you dig through the RFCs you'll find standards requirements backing this up. -dean
sorry, but i can't stand this being left as NEEDINFO when it should be obvious to anyone that a one second timeout for the 3rd packet in a 3-way handshake is insane.
btw, i've been running with a 30s setting for TCP_DEFER_ACCEPT for 13 months now on a 40+ req/s website without any problems. i really don't see the harm in changing apache to have a sane setting for this value.
Agreed, thanks for the patch and analysis; committed as http://svn.apache.org/viewvc?view=rev&rev=501364 I've prodded some kernel guys, hopefully someone can clarify the semantics of the option argument and follow up your netdev post.
The Linux tcp(7) man page indicates that the parameter is NOT the number of seconds the kernel waits but instead is the number of attempts the TCP stack should make to complete the connection. This would indicate a value like "3" to be much more sane than "30", which could invite abuse. Quote below: TCP_DEFER_ACCEPT "Allows a listener to be awakened only when data arrives on the socket. Takes an integer value (seconds), this can bound the maximum number of attempts TCP will make to complete the connection. This option should not be used in code intended to be portable."
(In reply to comment #9) > The Linux tcp(7) man page indicates that the parameter is NOT the number of > seconds... um: > Quote below: > > TCP_DEFER_ACCEPT > "Allows a listener to be awakened only when data arrives on the socket. Takes an > integer value (seconds) ^^^^^^^
Please kill me. Sorry. (In reply to comment #10) > (In reply to comment #9) > > The Linux tcp(7) man page indicates that the parameter is NOT the number of > > seconds... > > um: > > > Quote below: > > > > TCP_DEFER_ACCEPT > > "Allows a listener to be awakened only when data arrives on the socket. Takes an > > integer value (seconds) > ^^^^^^^
(In reply to comment #3) > Question to the BSD guys: Is it possible to set a timeout for the BSD accept > filters accf_data and accf_http how long they should wait for a request until > they drop the socket? No, there isn't a way to set the timeout on the FreeBSD accept filters.
CC myself on FreeBSD related bugs
This exact problem causes havoc if you have many slow clients on slow networks (gprs). We actually had this problem in production. (many simultanious clients > 1000, on a single server) We actually had a >80% unsuccessful connection attempts. This is a really hard problem to debug. I cannot believe Apache ships with TCP_DEFER_ACCEPT on 1 sec, or even enabled at all. I don't see the possible gain to have this feature enabled. This should be default be turned off, it has no benefits only downsides. AcceptFilter http none AcceptFilter https none ref: varnish had it enabled for a while but they disabled it too [1] ref: ubuntu ticket [2] Can ubuntu change the default config to include: [1] https://github.com/Movile/varnish/commit/687bacb3152ebc8b00b8dd737ef1dedb12bd4ee2 [2] https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/134274?comments=all
(In reply to harm from comment #14) > I cannot believe Apache ships with TCP_DEFER_ACCEPT on 1 sec, or even > enabled at all. This isn't the case anymore, the new value is 30s since 2.2.28 (and has always been 30s in 2.4.x), still hardcoded though. By the way, TCP_DEFER_ACCEPT=1 is not really a second since TCP_SYNCNT (defaulting to sysctl's tcp_synack_retries when the option is not set like in httpd) is always honored (the final client's 3Way handshake ACK being continuously dropped during defer-accept, there is no real SYN/ACK to send after the one already sent for the SYN, so I mean the time that would have been needed by the server to send that many ACKs). Consequently, one can also play with net.ipv4.tcp_synack_retries (or TCP_SYNCNT) to adjust the TCP_DEFER_ACCEPT timeout as needed (above 30s). > I don't see the possible gain to have this feature enabled. Well, the listener won't accept (spend resources for) spurious connections, which stay in kernel land.
Fiexed in 2.2.28 (r1608298).
(In reply to Yann Ylavic from comment #15) > This isn't the case anymore, the new value is 30s since 2.2.28 (and has > always been 30s in 2.4.x), still hardcoded though. confirmed, we had these problems in 2.2.22, sorry for bringing it up again. > Well, the listener won't accept (spend resources for) spurious connections, > which stay in kernel land. Well... thats whats advertised..yes. It was somehow broken though (with a value of 1). resulting in connections not passed at all from kernel land. So at least in 2.2.22 I doubt you'll _ever_ really see any benefits. (we concluded this after many many hours staring at wireshark logs, with telco partner) To mee it sounds like a fishy optimization, without (measured) benefits. maybe this made sense when linux/bsd where competing pleasing random non relevant benchmarks (a decade ago)... but I doubt this is valid nowadays.