ap_lingering_close is supposed to read any leftover data in the connection until there's no more data or until MAX_SEC_TO_LINGER (30) seconds elapsed. But turns out that it only performs up to 15 reads of (at most) 512 bytes.
Created attachment 15353 [details] Patch that fixes the problem. This patch fixes the issue and 2 warnings I got with -Wall on apr_shutdown and apr_recv not being declared.
apr_shutdown should be part of apr_compat.h. The submitted patch is against the 2.0.x branch, not trunk.
This was mentioned on http-wg a while ago, indeed. Does the current behaviour actually cause practical problems? The change introduces an extra gettimeofday() call into the normal processing of every request so it needs good justification.
I found this out because I was sending a POST of a few MB to a web service, but I got the url wrong and saw that there were 15 small reads when closing before i got the RST when trying to read from the socket.
OK, there are a few alternatives here: - take the hit on the uncommon path where the read() doesn't get EOF first time through, and keep calling read()/apr_time_now() until 30 seconds have *really* passed pro: equally as safe as 1.3 code con: more overhead - bump the tmp buffer size to 8K and lower the read() timeout to 1 second. this way the server could eat 30*8K=245K bytes without additional overhead, giving a significantly better chance of getting the response to the client than just 15*512=~7K pro: little more overhead than current 2.0 code con: still less safe than 1.3
If there's anything I can say, I'd go for the first alternative and linger for at most ~32s, while not incurring in any additional penalty for the common path.
The reason that such code isn't in there now is to avoid all the syscalls (retrieving the time). Some cleverness may allow an implementation that is willing to wait longer (close to MAX_SECS_TO_LINGER) without retrieving the time so much.
Created attachment 15375 [details] Second attempt. Patch against http-2.0.x How about this? There are no calls to time() involved. The first read polls for up to 30s and subsequent ones only for 0.5s, with a limit on the maximum number of reads that doubles the maximum read length existing now.
Oops. Forgot to increment nread_ops in the loop.
I don't see how that implementation actually fixes the problem. Or am I missing something fundamental? The *problem* is that the server is not reading enough bytes from the socket to eat up the TCP window and prevent an RST from allowing the client to see the response. Just changing the timeouts makes no difference, the solution needs to actually increase the number of read() calls made (and/or increase the buffer size passed to read). In the situation which lingering close is helping, the first read call is *not* going to time out; the TCP receive buffer for this socket on the server will already be non-empty so it will necessarily return data immediately. Changing that first timeout just introduces a new problem; if the client disappears completely after reading the response, the server will now hang around for thirty seconds waiting for a FIN that will never arrive, rather than just for two seconds.
Fixed on trunk and back-ported for 2.1.8 and later. http://svn.apache.org/viewcvs?rev=291452&view=rev
*** Bug 17722 has been marked as a duplicate of this bug. ***