When decoders read from a raw underlying stream (such as a file channel), the performance can degrade an order of magnitude compared to the case when there's a simple buffer in between the physical data source and the codec.
COMPRESS-380 for an example of this.
The API of ZipFile is straightforward and tempting enough that blocks of code such as:
seem perfectly justified. The above code suffers from severe performance degradation compared to prebuffered input. Severe means severe. Here are some stats from running a snippet of code similar to the above to "just decompress" the same input (~80mb) compressed with different methods.
Current master branch:
And a simple patch wrapping BoundedInputStream with a BufferedInputStream (deflate64 and bzip2 only, deflate uses java's internal inflater and it prebuffers stuff internally).
The difference should be evident, even with a tiny buffer of 512 bytes. To put this into perspective on a larger archive:
deflate64 improves by ~4900%...
I also see that ExplodingInputStream is already wrapping bis in a buffered input stream, so I don't see any reason why this shouldn't be done for other compressor streams. An even better patch (to me) would be to modify the constructors of Deflate64CompressorInputStream and BZip2CompressorInputStream and add a boolean parameter unbuffered: then people would know what they're doing when they pass some input stream and true to such a constructor. The default, single-argument constructor would simply delegate to constructor(inputStream, false) to ensure an input buffer in between the decoder and the raw stream.
The patch is trivial, so I don't attach it?