Summary: | AJP APR connector to mod_jk can result in SEND_BODY_CHUNK message after END_RESPONSE message | ||
---|---|---|---|
Product: | Tomcat 6 | Reporter: | brett.dellegrazie |
Component: | Connectors | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | brett.dellegrazie |
Priority: | P2 | ||
Version: | 6.0.29 | ||
Target Milestone: | default | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: | testcase: Servlet for writing directly to output stream |
Description
brett.dellegrazie
2010-09-14 11:52:20 UTC
There was a patch supplied by Rainer Jung which we used to get more information out of mod_jk. It was how we realised there were out-of order messages in the queue on the Tomcat side. Patch: --- jk_ajp_common.c 2010-02-23 08:26:02.000000000 +0100 +++ jk_ajp_common.c.debug 2010-09-02 18:24:41.429730000 +0200 @@ -904,9 +904,9 @@ cmd < JK_AJP13_SEND_BODY_CHUNK || cmd > AJP13_CPONG_REPLY) { jk_log(l, JK_LOG_WARNING, - "awaited reply cpong, received %d instead. " + "awaited reply cpong, received %d (%d / %d) instead. " "Closing connection", - cmd); + cmd, i, ae->last_op); /* We can't trust this connection any more. */ ajp_abort_endpoint(ae, JK_TRUE, l); JK_TRACE_EXIT(l); Mladen Turk wrote in email on tomcat-users list [Tue, 14 Sep 2010 07:50:55 +0200]: For AJP APR connector 'if (actionCode == ActionCode.ACTION_CLIENT_FLUSH)' doesn't check if the socket was already virtually closed so I guess that would be a first thing to look at. We cannot just check for finished flag thought, cause recycle() would reset that. Instead if there was getOutputSteam() we would need to directly invalidate that stream. This means we would need to remember that stream somewhere. Further more the wrapped socket is 'long' pointing to the native pointer, so we don't have object notification on physical socket close (that's another problem that manifests in JVM core for detached sockets). Thanks for filing that. I'll take a look into that. It would be nice if you could provide a simple jsp or servlet test case (without third party packages) to speed up the hacking. Created attachment 26055 [details]
testcase: Servlet for writing directly to output stream
This is a test servlet for writing simple data directly to the output stream. It should trigger the issue in this bug
I have tried your test servlet, but I cannot see anything suspicious. There is no additional packets following the lorem ipsum. Tested on Tomcat 6.0.29, 6.0.x trunk and Tomcat 7. Both APR and JIO endpoints behave the same. Hi, When using the test servlet the problem appears on the next request when its on the same socket so its a good idea to have a static file around to request immediately after executing the test servlet. The way we did this was to use Jmeter with only 1 thread but executing the two requests in sequence with no wait time. With Apache HTTPD you should see mod_jk (in debug mode) detect an explicit flush packet and reset the socket instead of the usual CPONG expected response as reported originally. Thanks, Brett I'm still not been able to reproduce the stuff. I don't use jmeter but rather a simple ab which can only hit a single uri, so this might be the reason why its not manifested there. BTW, does it happen only with APR connector, or with JIO as well? I can't reproduce this with the latest Tomcat 6 code with either the BIO AJP connector or the APR AJP connector. A check of the mod_jk debug log confirms that that cping/cpong works as expected. I have been using the supplied Servlet and a 46k static file. Without a test case to reproduce this issue, this bug is heading towards being closed as "works for me" . (In reply to comment #8) > I can't reproduce this with the latest Tomcat 6 code with either the BIO AJP > connector or the APR AJP connector. A check of the mod_jk debug log confirms > that that cping/cpong works as expected. > > I have been using the supplied Servlet and a 46k static file. > > Without a test case to reproduce this issue, this bug is heading towards being > closed as "works for me" . Hi, I've tried to reproduce this in a test application as well and the problem just doesn't occur. At present, we can only generate it using our application, sorry. Don't know how to progress this further now. :( Regards, Brett The test solution would have to use the same sort of writer (image in your case I suppose) We know that image classes can crash the APR connector because they try to manage the socket by themselves so double close can occur. This might also cause the SBC to been send after the EOR since with AJP the physical socket is not closed. Again without a workable test case it's just a guess. Without a reproducible test case this is eventually going to get closed as WONTFIX. Can you check if r1066772 fixes the issue? It's for trunk, but applying it for 6.0.x should be trivial Hi Mladen, We switched from using Apache HTTPD to HAproxy late last year because of this issue... therefore it might take me quite a while to verify this. I'll try, but no guarantees. Best Regards, Brett Hey - we just ran into this issue this week, and I finally managed to track it down to this bug after a good two days of work. Gotta get this fixed. I will attempt to both come up with a simple test case and try the provided patch next week. I've proposed r1066772 for 6.0.x since it is the right thing to do and it may help fix this issue. Confirmation that it does fix it would still be helpful. The fix has been back-ported and absent any reports to the contrary, the assumption is that the issue is fixed. |