Affects at least versions 2.0.52 and 2.2.x. Not sure about 2.3.x; it's pretty different. I ran into bug 34332 with httpd 2.0.52 and apr 0.9.6. It was frustrating to diagnose because Apache just skipped the end of one chunk and went on to the next. My Subversion client said this: $ svn co http://example.com/svn/foo ... svn: REPORT request failed on '/svn/!svn/vcc/default' svn: REPORT of '/svn/!svn/vcc/default': Chunk delimiter was invalid (http://example.com) Although bug 34332 has been fixed in APR, Apache's behavior still confounds diagnosis of such conditions. It continues to use the connection if a write has returned EAGAIN. In core_output_filter it logs this: ap_log_error(APLOG_MARK, APLOG_INFO, rv, c->base_server, "core_output_filter: writing data to the network"); but that's a priority below the default LogLevel, so I didn't see it. This bit here: /* No need to check for SUCCESS, we did that above. */ if (!APR_STATUS_IS_EAGAIN(rv)) { c->aborted = 1; } /* The client has aborted, but the request was successful. We * will report success, and leave it to the access and error * logs to note that the connection was aborted. */ is broken. The status should never be EAGAIN here, and if it is, aborting the connection would be better than ignoring it. The conditional just makes the comment wrong.
Created attachment 18310 [details] Fix for httpd-2.2.x branch
Tested it this morning; my fix was not enough. Stepping back a second, I see this on an strace of brokenness: here it's trying to send a 130203-byte chunk. Only 27487 bytes get sent out, leaving 102716. writev(12, [{"1fc9b\r\n", 7}, {"5xwAAIagAJRlhdwPgIoCRoJ0lkWDEICH"..., 3611}, {"QtsAfKloOGu5EQW50evUC2wwSilrLzKE"..., 126592}, {"\r\n", 2}], 4) = 27494 Here it succeeds in sending 66608 bytes, leaving 36108. poll([{fd=12, events=POLLOUT, revents=POLLOUT}], 1, 120000) = 1 writev(12, [{"IEBwkBEAAABQMCAwMIBwUJAgMEBREGBw"..., 102716}, {"\r\n", 2}], 2) = 66608 Here it succeeds in sending 33304 bytes, leaving 2804. poll([{fd=12, events=POLLOUT, revents=POLLOUT}], 1, 120000) = 1 writev(12, [{"ePFoc8+VaNcmDB5ycecSbTE9/Ps2R7ch"..., 36108}, {"\r\n", 2}], 2) = 33304 ...next write times out: poll([{fd=12, events=POLLOUT}], 1, 120000) = 0 [...some reads from another descriptor...] And here it goes on to the next chunk, completely forgetting about those 2804 bytes. writev(12, [{"12e8e\r\n", 7}, {"AACD4lSOD4OxYICVU1GVF41QlFGFRZRl"..., 2452}, {"hDB/ jOkpPxHqDQJh4NcLNujizjHb8wSl"..., 75002}, {"\r\n", 2}], 4) = 10136 With the patch below and another one that makes writev_it_all log everything (different run), I see this: [Thu May 18 08:32:42 2006] [error] writev_it_all 130212 bytes: take 1: apr_sendv 130212 bytes -> 27494 [Thu May 18 08:32:42 2006] [error] writev_it_all 130212 bytes: take 2: apr_sendv 102718 bytes -> 66608 [Thu May 18 08:32:42 2006] [error] writev_it_all 130212 bytes: take 3: apr_sendv 36110 bytes -> 33304 [Thu May 18 08:34:42 2006] [error] (70007)The timeout specified has expired: writev_it_all 130212 bytes: take 4: apr_sendv 2806 bytes -> 0 [Thu May 18 08:34:42 2006] [info] (70007)The timeout specified has expired: core_output_filter: writing data to the network [Thu May 18 08:34:42 2006] [error] writev_it_all 77463 bytes: take 1: apr_sendv 77463 bytes -> 10136 There are two problems here: (1) APR_TIMEUP; our network's pretty messed up right now so this isn't that surprising. (2) core_output_filter sees the error and gives up on data it has to send without aborting the connection, causing further chunks to be messed up and diagnosis to be frustrating. I opened this bug for #2. Apparently setting c->aborted isn't enough, though it looks like a bunch of callers check it before writing stuff. I'll look around a bit more.
If c->aborted gets set unconditionally as your patch proposes and things still do not work I think this is a bug of the caller (seems to be subversion in your case) as it should not call ap_pass_brigade again if c->aborted is set. But maybe the following patch is a useful sanity check to ensure that no further data gets sent: Index: core_filters.c =================================================================== --- core_filters.c (Revision 406722) +++ core_filters.c (Arbeitskopie) @@ -542,6 +542,20 @@ apr_read_type_e eblock = APR_NONBLOCK_READ; apr_pool_t *input_pool = b->p; + /* + * Maybe we have been called by a buggy part in the lower filter chain + * or a buggy handler on a connection that has been already aborted. + * Just destroy the brigade and return with APR_SUCCESS. + */ + if (c->aborted) { + ap_log_cerror(APLOG_MARK, APLOG_ERR, rv, c, "core_output_filter:" + " Trying to write on a connection aborted by the" + " client."); + if (b) + apr_brigade_destroy(b); + return APR_SUCCESS; + } + if (ctx == NULL) { ctx = apr_pcalloc(c->pool, sizeof(*ctx)); net->out_ctx = ctx;
Yeah, that's helpful. I'm totally in favor of anything that makes stuff easier to diagnose. Looks like the call stack is this: window_handler * svn_stream_write ** brigade_write_fn *** apr_brigade_write **** ap_filter_flush ***** ap_pass_brigade ****** ap_byterange_filter ******* ap_pass_brigade ******** ap_content_length_filter ********* ap_pass_brigade ********** chunk_filter *********** ap_pass_brigade ************ core_output_filter <- sets aborted ... ** svn_stream_write again ... ************ core_output_filter again <- errors Subversion's only use of aborted seems to be this in dav_svn__send_xml: /* ### check for an aborted connection, since the brigade functions don't appear to be return useful errors when the connection is dropped. */ if (output->c->aborted) return svn_error_create(SVN_ERR_APMOD_CONNECTION_ABORTED, 0, NULL); If the contract on brigade writing is that the caller should check c->aborted before and after, then perhaps this should be moved into brigade_write_fn. I will take it up on the Subversion mailing list and see what happens.
Created attachment 18319 [details] Patch to mod_dav_svn to avoid this problem Could you try this patch to mod_dav_svn and see if it fixes the problem?
Created attachment 18320 [details] Patch to mod_dav_svn to avoid this problem Oops, here's one that'll probably compile, and use the correct error constant...
With the mod_dav_svn patch, I end up in core_output_filter on an aborted connection with this call stack: #3 0x00c526a8 in core_output_filter (f=0x9f97080, b=0xa013bb0) at /usr/src/debug/httpd-2.0.52/ server/core.c:3966 #4 0x00c49009 in ap_pass_brigade (next=0x9f97080, bb=0xa013bb0) at /usr/src/debug/ httpd-2.0.52/server/util_filter.c:511 #5 0x00c33423 in chunk_filter (f=0xa013ab0, b=0xa013bb0) at /usr/src/debug/httpd-2.0.52/ modules/http/http_core.c:217 #6 0x00c49009 in ap_pass_brigade (next=0xa013ab0, bb=0xa013bb0) at /usr/src/debug/ httpd-2.0.52/server/util_filter.c:511 #7 0x00c4bae5 in ap_content_length_filter (f=0x9fcce48, b=0xa013bb0) at /usr/src/debug/ httpd-2.0.52/server/protocol.c:1241 #8 0x00c49009 in ap_pass_brigade (next=0x9fcce48, bb=0xa013bb0) at /usr/src/debug/ httpd-2.0.52/server/util_filter.c:511 #9 0x00c4b640 in end_output_stream (r=0x9fcc078) at /usr/src/debug/httpd-2.0.52/server/ protocol.c:1032 #10 0x00c387de in ap_process_request (r=0x9fcc078) at /usr/src/debug/httpd-2.0.52/modules/http/ http_request.c:256 #11 0x00c3363f in ap_process_http_connection (c=0x9f96b80) at /usr/src/debug/httpd-2.0.52/ modules/http/http_core.c:250 #12 0x00c46349 in ap_run_process_connection (c=0x9f96b80) at /usr/src/debug/httpd-2.0.52/ server/connection.c:42 #13 0x00c466b4 in ap_process_connection (c=0x9f96b80, csd=0x0) at /usr/src/debug/httpd-2.0.52/ server/connection.c:175 #14 0x00c39901 in child_main (child_num_arg=Variable "child_num_arg" is not available. ) at /usr/src/debug/httpd-2.0.52/server/mpm/prefork/prefork.c:609 #15 0x00c39af8 in make_child (s=0x9eb8e88, slot=0) at /usr/src/debug/httpd-2.0.52/server/mpm/ prefork/prefork.c:649 #16 0x00c39c17 in startup_children (number_to_start=8) at /usr/src/debug/httpd-2.0.52/server/ mpm/prefork/prefork.c:721 #17 0x00c3a4db in ap_mpm_run (_pconf=0x9eb70a8, plog=0x9ee3158, s=0x9eb8e88) at /usr/src/ debug/httpd-2.0.52/server/mpm/prefork/prefork.c:940 #18 0x00c4134a in main (argc=2, argv=0xbfff3694) at /usr/src/debug/httpd-2.0.52/server/main.c: 618 which I guess means Subversion isn't the only such caller. Is it just me or is checking c->aborted all the time a pretty awkward thing to expect callers to do? I'm not familiar enough with Apache's innards to know what returning failures from core_output_filter would do, but it does seem more inline with callers' usual expectations.
Now that bug 31759 is fixed there should be no need for the core output filter to be doing this trick of returning APR_SUCCESS on failure AFAIK. It should now be safe and correct to always just return the error on that error path (I don't know why the EAGAIN check would be needed there either); and also add the check per comment 3, which can also just return APR_ECONNABORTED rather than hiding the failure.
What is the status of this bug and what are possible solutions? We have Subversion users running Apache 2.2.8, with mod_dav_svn 1.5.0 that still get this error pretty regularly, so I assume there is still some kind of fix needed.
This should be fixed on the 2.2.x branch now as of r791454.