Created attachment 22257 [details] performance comparison code Environment: Ubuntu Linux/amd64 x86 jre 1.5.0_13-b05 ant.jar in ant-1.7.1 I'd like to use org.apache.tools.zip instead of java.util.zip because of the filename encoding problem of java.util.zip, and have a performance problem on org.apache.tools.zip. The attached Java code compress 2 files (3MiB and 2MiB) with org.apache.zip and java.util.zip. It shows org.apche.zip is 20x slower than java.util.zip. Output: % java -cp .:ant-1.7.1.jar ZipPerformance -apache -jdk ==> Benchmarking Apache: 95832 [ms] JDK: 4717 [ms]
I looked the source code. When we call ZipOutputStream.write(byte[]) for a large byte array, * org.apache.tools.zip call Deflater.setInput() once for the whole of the array * java.util.zip call Deflater.setInput() multiple times. One call handles a 512 byte chunk of the array.
Created attachment 22263 [details] extended perfromance comparison code
I've extended the test code which compressed two big files (2 and 3 MB) to cover the case of many small files (2000 files of 2 or 3 kB) and covered reading as well. The big file compression case is actually worse on my machine (WinXP) where java.util.zip is more like 40 times faster. OTOH Ant wins in the small file case. Ant is slower when reading the ZIPs, but the performance difference isn't as bad. ==> Benchmarking big files Apache write warmup done Apache write: 147640 [ms] JDK write warmup done JDK write: 3219 [ms] Apache read warmup done Apache read: 453 [ms] JDK Warmup done JDK read: 125 [ms] ==> Benchmarking small files Apache write warmup done Apache write: 4406 [ms] JDK write warmup done JDK write: 6531 [ms] Apache read warmup done Apache read: 1859 [ms] JDK Warmup done JDK read: 1312 [ms] I made the ocde compile on JDK 1.4 because I wanted to compare different JDKs. In the end the differeneces were so small I didn't include them here (JDK6 was a bit faster for java.util.zip as well as in the Ant case). For reference, this is against Ant's subversion revision 677166.
same machine svn revision 677272: ==> Benchmarking big files Apache write warmup done Apache write: 3407 [ms] JDK write warmup done JDK write: 3297 [ms] Apache read warmup done Apache read: 422 [ms] JDK Warmup done JDK read: 125 [ms] ==> Benchmarking small files Apache write warmup done Apache write: 4438 [ms] JDK write warmup done JDK write: 6563 [ms] Apache read warmup done Apache read: 1844 [ms] JDK Warmup done JDK read: 1359 [ms] Deflater seems to copy its input around since I can see bigger memory consumption during the Ant code tests. There is no hint in the Javadocs and I have no idea why chunking the original input should help - other than that it helps the native implementation of Sun's Deflater class. I've searched through the zlib and InfoZIP code base to find any reference to good byte chunk sizes to pass to the compression library and found that InfoZIP's zip will use between 2kB (SMALL_MEM) and 16 kB (LARGE_MEM). I've changed the code to use 8kB blocks, which has the side effect of doing nothing when ZipOutputStream is used via <zip> and friends. Ant's tasks have always read the file content in 8kB chunks and written those blocks to the ZipOutputStream - so Ant's tasks have never seen the poor performance for big files.
Thank you for the quick fix!