So one possible way is to let CodecPool do special for Gzip codec, and does either
1) keeps a map for holding gzip codec of different settings.
2) treats the setting as a global setting, and when the setting is changed, clean all gzip codecs cached in CodecPool.
Does the changes for CodecPool sound reasonable/acceptable?
I'm not sure the "clean" semantics have clear triggers (or they're not clear to me). I'd suggest an analog to end in the (Dec|C)ompressor interface that reinitializes a (de)compressor, then use those interfaces in the CodecPool. This would be a better fix for
HADOOP-5281, but it requires updates to other implementors of Compressor. Something like reinit that destroys (with end) and recreates (with init) the underlying stream. Overloading CodecPool::getCompressor to take a Configuration and... well, tracing the implications through the rest of the Codec classes makes it easy to trace where compressors are recycled. Calling reinit with parameters matching the current ones should be a noop and calling CodecPool::getCompressor without any arguments should use default params.
Since this is a fair amount of work, if you wanted to narrow the issue to be global settings for GzipCodec, then an approach like that in the current patch is probably sufficient for many applications.
Quick asides on the current patch: ZlibCompressor::construct should be final; if overridden in a subclass, the partially created object would call the subclass instance from the base cstr. Also, since the parameters are specific to GzipCodc, they should not have generic names like "io.compress.level".