35292 – ap_lingering_close does not linger up to MAX_SECS_TO_LINGER

Bug 35292 - ap_lingering_close does not linger up to MAX_SECS_TO_LINGER

Summary: ap_lingering_close does not linger up to MAX_SECS_TO_LINGER

Status:	RESOLVED FIXED

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	Core (show other bugs)
Version:	2.0-HEAD
Hardware:	Other All

Importance:	P2 trivial (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:	PatchAvailable

Duplicates (1):	17722 (view as bug list)
Depends on:
Blocks:

Reported:	2005-06-09 17:40 UTC by Gonzalo Paniagua Javier
Modified:	2005-09-25 14:23 UTC (History)
CC List:	1 user (show)

Attachments
Patch that fixes the problem. (1.44 KB, text/plain) 2005-06-09 17:42 UTC, Gonzalo Paniagua Javier	Details
Second attempt. Patch against http-2.0.x (2.62 KB, text/plain) 2005-06-11 20:37 UTC, Gonzalo Paniagua Javier	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Gonzalo Paniagua Javier 2005-06-09 17:40:24 UTC

ap_lingering_close is supposed to read any leftover data in the connection until
there's no more data or until MAX_SEC_TO_LINGER (30) seconds elapsed.

But turns out that it only performs up to 15 reads of (at most) 512 bytes.

Comment 1 Gonzalo Paniagua Javier 2005-06-09 17:42:28 UTC

Created attachment 15353 [details]
Patch that fixes the problem.

This patch fixes the issue and 2 warnings I got with -Wall on apr_shutdown and
apr_recv not being declared.

Comment 2 Paul Querna 2005-06-09 18:20:46 UTC

apr_shutdown should be part of apr_compat.h.  The submitted patch is against the
2.0.x branch, not trunk.

Comment 3 Joe Orton 2005-06-10 16:27:10 UTC

This was mentioned on http-wg a while ago, indeed.

Does the current behaviour actually cause practical problems?  The change
introduces an extra gettimeofday() call into the normal processing of every
request so it needs good justification.

Comment 4 Gonzalo Paniagua Javier 2005-06-10 18:06:54 UTC

I found this out because I was sending a POST of a few MB to a web service, but
I got the url wrong and saw that there were 15 small reads when closing before i
got the RST when trying to read from the socket.

Comment 5 Joe Orton 2005-06-10 19:52:06 UTC

OK, there are a few alternatives here:

- take the hit on the uncommon path where the read() doesn't get EOF first time
through, and keep calling read()/apr_time_now() until 30 seconds have *really*
passed

pro: equally as safe as 1.3 code
con: more overhead

- bump the tmp buffer size to 8K and lower the read() timeout to 1 second.  this
way the server could eat 30*8K=245K bytes without additional overhead, giving a
significantly better chance of getting the response to the client than just
15*512=~7K

pro: little more overhead than current 2.0 code
con: still less safe than 1.3

Comment 6 Gonzalo Paniagua Javier 2005-06-10 20:32:23 UTC

If there's anything I can say, I'd go for the first alternative and linger for
at most ~32s, while not incurring in any additional penalty for the common path.

Comment 7 Jeff Trawick 2005-06-11 19:38:30 UTC

The reason that such code isn't in there now is to avoid all the syscalls
(retrieving the time).  Some cleverness may allow an implementation that is
willing to wait longer (close to MAX_SECS_TO_LINGER) without retrieving the time
so much.

Comment 8 Gonzalo Paniagua Javier 2005-06-11 20:37:13 UTC

Created attachment 15375 [details]
Second attempt. Patch against http-2.0.x

How about this? There are no calls to time() involved. The first read polls for
up to 30s and subsequent ones only for 0.5s, with a limit on the maximum number
of reads that doubles the maximum read length existing now.

Comment 9 Gonzalo Paniagua Javier 2005-06-11 20:38:15 UTC

Oops. Forgot to increment nread_ops in the loop.

Comment 10 Joe Orton 2005-06-12 13:46:56 UTC

I don't see how that implementation actually fixes the problem.  Or am I missing
something fundamental?

The *problem* is that the server is not reading enough bytes from the socket to
eat up the TCP window and prevent an RST from allowing the client to see the
response.  Just changing the timeouts makes no difference, the solution needs to
actually increase the number of read() calls made (and/or increase the buffer
size passed to read).

In the situation which lingering close is helping, the first read call is *not*
going to time out; the TCP receive buffer for this socket on the server will
already be non-empty so it will necessarily return data immediately.

Changing that first timeout just introduces a new problem; if the client
disappears completely after reading the response, the server will now hang
around for thirty seconds waiting for a FIN that will never arrive, rather than
just for two seconds.

Comment 11 Joe Orton 2005-09-25 22:16:45 UTC

Fixed on trunk and back-ported for 2.1.8 and later.

http://svn.apache.org/viewcvs?rev=291452&view=rev

Comment 12 Joe Orton 2005-09-25 22:23:56 UTC

*** Bug 17722 has been marked as a duplicate of this bug. ***