Summary: | corrupt gzip output when compression=on in server.xml | ||
---|---|---|---|
Product: | Tomcat 6 | Reporter: | David Marcks <dmarcks> |
Component: | Connectors | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 6.0.33 | ||
Target Milestone: | default | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: |
Gzip Corruption Repro WAR
Isolated Reproduction without using Tomcat 2011-11-07_tc6_FlushableGZIPOutputStream.patch 2011-11-11_tc6_FlushableGZIPOutputStream-v2.patch |
Description
David Marcks
2011-11-01 22:15:16 UTC
Created attachment 27887 [details]
Isolated Reproduction without using Tomcat
Many thanks for the excellent test case. It looks like it was very difficult to track down. We really do appreciate your efforts on this. I have converted the stand-alone reproduction JAR to a Tomcat test case. It isn't yet in the standard test suite as it currently always fails. I noticed that the following is sufficient to reproduce the issue: - send parts 19 to 28, flush, 29, flush, close I have therefore merged parts 19 to 28 for the test case. The failure is a CRC error. At this point this looks like a JVM bug. We don't normally address JVM bugs in the Tomcat codebase but if we can figure out exactly what causes this bug (it seems to be related to a very precise state) then we might be able to detect the state and prevent the flush that triggers the bug. Failing that, it should be possible to make use of the flushable GZIP filter configurable. As a workaround, you may be able to add an explicit flush() to the page(s) where this occurs. Note that you can run the test by adding the following line to the build.properties file: test.entry=org.apache.coyote.http11.filters.TesterFlushableGZIPOutputStream Just for reference: If I run the test through Ant it fails in an odd way: [[[ Testsuite: org.apache.coyote.http11.filters.TesterFlushableGZIPOutputStream Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0,031 sec Testcase: testBug52121 took 0,016 sec Caused an ERROR null java.lang.NullPointerException at org.apache.catalina.util.IOTools.flow(IOTools.java:74) at org.apache.catalina.util.IOTools.flow(IOTools.java:85) at org.apache.coyote.http11.filters.TesterFlushableGZIPOutputStream.testBug52121(TesterFlushableGZIPOutputStream.java:47) ]]] I cannot explain this NPE. If I run it as JUnit test from with Eclipse IDE I get the IO error: [[[ java.io.IOException: Corrupt GZIP trailer at java.util.zip.GZIPInputStream.readTrailer(GZIPInputStream.java:203) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:94) at java.io.FilterInputStream.read(FilterInputStream.java:90) at org.apache.catalina.util.IOTools.flow(IOTools.java:74) at org.apache.catalina.util.IOTools.flow(IOTools.java:85) at org.apache.coyote.http11.filters.TesterFlushableGZIPOutputStream.testBug52121(TesterFlushableGZIPOutputStream.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) ]]] The sources are trunk@1197344, both were run with jdk 6u29 (build 1.6.0_29-b11) on Windows XP 32-bit. I think I fixed it with r1197382. Regarding the NPE when testing with Ant: it is a fault in the test case. The resources are not copied when compiling the test classes. Thus classloader.getResourceAsStream() does not find the resource and returns null. Unfortunately, that hasn't fixed it. It just avoids the call that triggers the problem. I've added an additional write to expose the issue again. I also found a way to crash the JVM. This makes me more convinced this is as JVM bug. Several additional notes A link to the thread by Christopher Taylor mentioned by OP: http://tomcat.markmail.org/thread/v56xpi5uz4end7cd Note, that the Christopher Taylor's thread contains an attachment but only containing the corrupted gzip stream. 1. I can observe the CRC error with 3-rd party tool, 7-zip. It cannot decompress the file in Christopher Taylor's message, nether it can the one produced by the test case for this issue before I committed my changes. It reports CRC error. Christopher Taylor's file was produced by Tomcat 7.0.14. Thus not only Java report that the stream is broken. 2. According to the sources of deflate.c of zlib library that can be found in many placed, changing compression level which is performed by "deflateParams" call indeed causes deflate(strm, Z_PARTIAL_FLUSH) flushing the buffer. That is in some old version of the code. In more recent ones it is just a deflate(strm, Z_BLOCK). 3. Looking at the content of the streams produced by compression. Both the stream in Christopher Taylor's message and the one produced by the JUnit test case for this bug have uncompressed data near the end of the stream. That is "</id><version>"... or "<div "... respectively - about 200 bytes of plain text. It just means that the flush in deflateParams() call failed to flush the whole stream. That happens when the output buffer used by deflate.c (that is configured by 'size' parameter in GZIPOutputStream constructor) is too small. There is a loop that deals with this in DeflaterOutputStream.deflate(). Because the effective compression level at that time is Deflater.NO_COMPRESSION the data is flushed uncompressed. The following follows: 1) If I increase size of the buffer by the following change in FlushableGZIPOutputStream constructor, the problem reported here disappears: - super(os); + super(os, 64*1024); 4. The Java bugs 4255743 and 4813885 that are mentioned in the Javadoc for the FlushableGZIPOutputStream class are both fixed in Java 7. There is an additional boolean argument to GZIPOutputStream and DeflaterOutputStream constructors in Java 7, "syncFlush". If it is set to true, then calling mere flush() results in flushing the compressor. 5. Some documentation mentions that there is a trick in gzip when partial flush is performed. It sends either one block or two blocks. http://www.bolet.org/~pornin/deflate-flush.html Further fixed in r1197578 I removed the trick with empty byte array and direct call to def.setInput(). Calling setInput() with empty data never happens in the standard write() method. The latest fix looks good to me. I have applied it to 7.0.x and it will be included in 7.0.23 onwards. Moving this issue to 6.0.x as it also needs to be fixed there. Created attachment 27905 [details]
2011-11-07_tc6_FlushableGZIPOutputStream.patch
Patch for Tomcat 6.0
Applied to 6.0 with r1200603 and will be in 6.0.34. Reverted in 6.0 in r1200620. It is broken. Browsers display empty page with compression="on". Wget says "No data received". As if connection has been closed. Debugging with trunk, there is problem in OutputBuffer.close(). It does: doFlush(false); closed = true; It closes underlying stream without closing FlushableGZIPOutputStream first. Consequently the attempt to write out the last byte at FlushableGZIPOutputStream#close() fails with an IOException. The issue in comment 11 fixed in trunk and 7.0 with r1200696 and r1200698 respectively, will be in 7.0.23. Created attachment 27924 [details]
2011-11-11_tc6_FlushableGZIPOutputStream-v2.patch
Updated patch for 6.0.x.
(In reply to comment #11) >> Debugging with trunk, there is problem in OutputBuffer.close(). >> It does: >> doFlush(false); >> closed = true; >> >> It closes underlying stream without closing FlushableGZIPOutputStream first. >> Consequently the attempt to write out the last byte at >> FlushableGZIPOutputStream#close() fails with an IOException. Just to clear misinformation: the statements quoted above are wrong. It behaved as if the stream were closed, but the real cause was finished Deflater. It has nothing to do with the underlying buffer and the code fragment above. See r1200696 Fixed in 6.0 with r1201193 and will be in 6.0.34. |