Bug 55772

Summary: Flushing AsyncContext response writer after ClientAbortException (BrokenPipe) causes request state to leak
Product: Tomcat 7 Reporter: Todd West <todd.west>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: critical CC: seth.pellegrino
Priority: P2    
Version: 7.0.42   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Attachments: Test case to reproduce request state (cookie) leakage

Description Todd West 2013-11-13 04:45:21 UTC
Created attachment 31040 [details]
Test case to reproduce request state (cookie) leakage

This issue is easily reproducible with the attached test case. Request state is leaked between requests due to the ADAPTER_NOTES note on the coyote request containing the catalina connector request (which holds the cookies from the first request) and never gets cleaned up in this error case (and maybe others similar to it).

It seems like it would be worthwhile to clear out the notes field from the coyote request when we recycle() it to prevent future issues like this.
Comment 1 Seth P 2013-11-13 18:20:09 UTC
From our analysis of the issue, the problem occurs because of AbstractHttp11Processor's reentrancy via the Response object. From the following stack trace:

java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
	at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:123)
	at org.apache.tomcat.util.net.NioBlockingSelector.write(NioBlockingSelector.java:94)
	at org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:174)
	at org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:163)
	at org.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer(InternalNioOutputBuffer.java:242)
	at org.apache.coyote.http11.InternalNioOutputBuffer.flush(InternalNioOutputBuffer.java:94)
	at org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:793)
	at org.apache.coyote.Response.action(Response.java:174)
	at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:359)
	at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:326)
	at org.apache.catalina.connector.CoyoteWriter.flush(CoyoteWriter.java:98)
	...
	<application code>
	...
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
	at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
	at org.apache.coyote.AbstractProtocol.AbstractConnectionHandler.process(AbstractProtocol.java:589)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1686)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)

The key insight is that the AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) frame near the bottom is inside the same instance as the AbstractHttp11Processor.action(AbstractHttp11Processor.java:793) frame near the top. The latter sets the instance-level error flag as a result of the broken pipe exception, causing the #process invocation to return SocketState.CLOSED. Because the request is async, the CoyoteAdapter has not cleaned up the catalina request in its finally block (the same object as referenced through ADAPTER_NOTES as Todd describes), but the SocketState.CLOSED indicates to the connection handler that the processor is ready for recycling (which does NOT recycle the catalina request) and re-use.
Comment 2 Mark Thomas 2013-11-15 13:48:28 UTC
Thanks for the test case and analysis. I modified it a little to incorporate it into the existing test case for AbstractHttp11Processor, to speed it up in the failure case (setting the async timeout less than the default of 30s) and to fix an NPE once the test passed.

The fix has been applied to 8.0.x for 8.0.0-RC6 onwards and 7.0.x for 7.0.48 onwards. The test case has also been added to both versions.

Thanks again for a very easy to fix bug report.