Hadoop Common
  1. Hadoop Common
  2. HADOOP-5879

GzipCodec should read compression level etc from configuration

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Provide an ability to configure the compression level and strategy for codecs. Compressors need to be 'reinited' with new characteristics such as compression level etc. and hence an incompatible addition to the api.

      Description

      GzipCodec currently uses the default compression level. We should allow overriding the default value from Configuration.

        static final class GzipZlibCompressor extends ZlibCompressor {
          public GzipZlibCompressor() {
            super(ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION,
                ZlibCompressor.CompressionStrategy.DEFAULT_STRATEGY,
                ZlibCompressor.CompressionHeader.GZIP_FORMAT, 64*1024);
          }
        }
      
      1. H5879-5.patch
        16 kB
        Chris Douglas
      2. hadoop-5879-5-21.patch
        4 kB
        He Yongqiang
      3. hadoop-5879-7-13-2.patch
        8 kB
        He Yongqiang
      4. hadoop-5879-7-13-3.patch
        8 kB
        He Yongqiang
      5. hadoop-5879-7-18-3.patch
        13 kB
        He Yongqiang
      6. hadoop-5879-7-26.patch
        14 kB
        He Yongqiang
      7. hadoop-5879-yahoo-0.20-v1.0.patch
        16 kB
        Amar Kamat

        Activity

        Hide
        Amr Awadallah added a comment -

        I am out of office on a business trip for the next couple of days and
        will be slower than usual in responding to emails. If this is urgent
        then please call my cell phone, otherwise I will reply to your email
        when I get back.

        Thanks for your patience,

        – amr

        Show
        Amr Awadallah added a comment - I am out of office on a business trip for the next couple of days and will be slower than usual in responding to emails. If this is urgent then please call my cell phone, otherwise I will reply to your email when I get back. Thanks for your patience, – amr
        Hide
        Amar Kamat added a comment -

        Attaching a patch for Yahoo!'s distribution of hadoop 0.20, not to be committed here.

        Show
        Amar Kamat added a comment - Attaching a patch for Yahoo!'s distribution of hadoop 0.20, not to be committed here.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #96 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/96/)

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #96 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/96/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #29 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/29/)
        . Read compression level and strategy from Configuration for
        gzip compression. Contributed by He Yongqiang

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #29 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/29/ ) . Read compression level and strategy from Configuration for gzip compression. Contributed by He Yongqiang
        Hide
        Chris Douglas added a comment -

        +1

        I committed this. Thanks, He!

        Show
        Chris Douglas added a comment - +1 I committed this. Thanks, He!
        Hide
        Chris Douglas added a comment -

        Hudson isn't picking up the patch, so I ran it on my machine:

             [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        

        All unit tests passed with and without native libs installed.

        Show
        Chris Douglas added a comment - Hudson isn't picking up the patch, so I ran it on my machine: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. All unit tests passed with and without native libs installed.
        Hide
        Chris Douglas added a comment -
        • Merged with trunk
        • Calls end() in reinit, avoiding leak
        • Removes construct and resetStrategyAndCompressionLevel, which caused findbugs warnings. Relies on invariant windowBits set in cstr for Default/ZlibCompressors to avoid reemergence of HADOOP-5281
        • Updated unit test to verify that compression level is reset
        Show
        Chris Douglas added a comment - Merged with trunk Calls end() in reinit, avoiding leak Removes construct and resetStrategyAndCompressionLevel, which caused findbugs warnings. Relies on invariant windowBits set in cstr for Default/ZlibCompressors to avoid reemergence of HADOOP-5281 Updated unit test to verify that compression level is reset
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12414549/hadoop-5879-7-26.patch
        against trunk revision 807753.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 4 new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/629/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/629/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/629/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/629/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12414549/hadoop-5879-7-26.patch against trunk revision 807753. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/629/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/629/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/629/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/629/console This message is automatically generated.
        Hide
        He Yongqiang added a comment -

        hadoop-5879-7-26.patch incorporates all Chris's suggestions (Thanks for the detailed comments, Chris!).
        I have to admit that the testcase can not fully verify the functionality. And to know that it works, what i did was by looking the log and debugging in my IDE. Please let me know if you know how we can test it automatically.

        Show
        He Yongqiang added a comment - hadoop-5879-7-26.patch incorporates all Chris's suggestions (Thanks for the detailed comments, Chris!). I have to admit that the testcase can not fully verify the functionality. And to know that it works, what i did was by looking the log and debugging in my IDE. Please let me know if you know how we can test it automatically.
        Hide
        Chris Douglas added a comment -

        Thanks for updating the patch

        • Rather than having special semantics for construct, I'd suggest removing the directBufferSize formal from construct and returning the allocation to the constructor
        • Sorry, my last comment was not clear. By "implementations should also describe..." I meant that the classes implementing reinit should include a description of what it effects in its javadoc for reinit, as you did in BuiltInZlibDeflater. I didn't mean that more logging was required. Compressor::reinit should also describe the contract for future implementers, "Prepare the compressor to be used in a new stream with settings defined in the given Configuration" or something like that
        • I think it's appropriate to fail if the configuration is invalid rather than taking the default in ZlibFactory::getCompression* (why ZlibFactory?). I don't think the getDefault* methods are necessary. setCompression* should take the appropriate enum, rather than String. Filed HADOOP-6161 for get/setEnum to simplify some of the conf-related code
        • In ZlibCompressor::reinit, reset is not called if the Configuration is null. For users calling these methods not via CodecPool, reset() should probably be called
        • CompressionLevel::compressionLevel() and CompressionStrategy::compressionStrategy() should remain package-private; the integers are implementation details
        • The unit test doesn't really verify the functionality added by this patch, save that it doesn't throw exceptions. That said, it would be hard to verify that this is working as expected without adding get* methods to ZlibCompressor. Can you describe how you verified the patch's correctness, both for the native and non-native codecs?
        Show
        Chris Douglas added a comment - Thanks for updating the patch Rather than having special semantics for construct, I'd suggest removing the directBufferSize formal from construct and returning the allocation to the constructor Sorry, my last comment was not clear. By "implementations should also describe..." I meant that the classes implementing reinit should include a description of what it effects in its javadoc for reinit, as you did in BuiltInZlibDeflater. I didn't mean that more logging was required. Compressor::reinit should also describe the contract for future implementers, "Prepare the compressor to be used in a new stream with settings defined in the given Configuration" or something like that I think it's appropriate to fail if the configuration is invalid rather than taking the default in ZlibFactory::getCompression* (why ZlibFactory?). I don't think the getDefault* methods are necessary. setCompression* should take the appropriate enum, rather than String. Filed HADOOP-6161 for get/setEnum to simplify some of the conf-related code In ZlibCompressor::reinit, reset is not called if the Configuration is null. For users calling these methods not via CodecPool, reset() should probably be called CompressionLevel::compressionLevel() and CompressionStrategy::compressionStrategy() should remain package-private; the integers are implementation details The unit test doesn't really verify the functionality added by this patch, save that it doesn't throw exceptions. That said, it would be hard to verify that this is working as expected without adding get* methods to ZlibCompressor. Can you describe how you verified the patch's correctness, both for the native and non-native codecs?
        Hide
        He Yongqiang added a comment -

        A new patch incorporates Chris' comments (Thanks, Chirs!).

        Show
        He Yongqiang added a comment - A new patch incorporates Chris' comments (Thanks, Chirs!).
        Hide
        Chris Douglas added a comment -

        Whoops; ZlibCompression isn't a class. Maybe setCompressionLevel(conf, value) should be added to GzipCodec instead.

        Show
        Chris Douglas added a comment - Whoops; ZlibCompression isn't a class. Maybe setCompressionLevel(conf, value) should be added to GzipCodec instead.
        Hide
        Chris Douglas added a comment -

        The latest patch looks good. A few notes:

        • BuiltInZlibDeflater can use the setLevel and setStrategy methods on java.util.zip.Deflater to implement reset
        • This should definitely have a unit test, particularly involving CodecPool
        • The javadoc for reinit should describe the contract for that method; the implementations should also describe what happens for that particular codec.
        • In ZlibCompression::reinit, shouldn't end() be called before reconfiguring the codec?
        • The direct buffer size should not be reconfigured. Codecs are pooled because recreating direct buffers is very expensive; allocating new direct buffers on a cache hit defeats the purpose of pooling them. The buffer size doesn't affect correctness of zlib compression.
        • This should add methods like ZlibCompression::setCompressionLevel(Configuration, <value>) to set the zlib.compress.* properties
        Show
        Chris Douglas added a comment - The latest patch looks good. A few notes: BuiltInZlibDeflater can use the setLevel and setStrategy methods on java.util.zip.Deflater to implement reset This should definitely have a unit test, particularly involving CodecPool The javadoc for reinit should describe the contract for that method; the implementations should also describe what happens for that particular codec. In ZlibCompression::reinit, shouldn't end() be called before reconfiguring the codec? The direct buffer size should not be reconfigured. Codecs are pooled because recreating direct buffers is very expensive; allocating new direct buffers on a cache hit defeats the purpose of pooling them. The buffer size doesn't affect correctness of zlib compression. This should add methods like ZlibCompression::setCompressionLevel(Configuration, <value>) to set the zlib.compress.* properties
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12413291/hadoop-5879-7-13-3.patch
        against trunk revision 793162.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 4 new Findbugs warnings.

        -1 release audit. The applied patch generated 266 release audit warnings (more than the trunk's current 260 warnings).

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/artifact/trunk/current/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413291/hadoop-5879-7-13-3.patch against trunk revision 793162. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs warnings. -1 release audit. The applied patch generated 266 release audit warnings (more than the trunk's current 260 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/572/console This message is automatically generated.
        Hide
        He Yongqiang added a comment -

        hadoop-5879-7-13-3.patch changed the conf string io.compress.* to zlib.compress.*

        Supported conf strings are:
        zlib.compress.level values can be: NO_COMPRESSION / BEST_SPEED / BEST_COMPRESSION / DEFAULT_COMPRESSION
        zlib.compress.strategy values can be: FILTERED / HUFFMAN_ONLY / RLE / FIXED / DEFAULT_STRATEGY
        zlib.compress.buffer.size values can be any positive integer

        Show
        He Yongqiang added a comment - hadoop-5879-7-13-3.patch changed the conf string io.compress.* to zlib.compress.* Supported conf strings are: zlib.compress.level values can be: NO_COMPRESSION / BEST_SPEED / BEST_COMPRESSION / DEFAULT_COMPRESSION zlib.compress.strategy values can be: FILTERED / HUFFMAN_ONLY / RLE / FIXED / DEFAULT_STRATEGY zlib.compress.buffer.size values can be any positive integer
        Hide
        He Yongqiang added a comment -

        upload a new patch integrated Chris' suggestions (Thanks, Chris).

        Show
        He Yongqiang added a comment - upload a new patch integrated Chris' suggestions (Thanks, Chris).
        Hide
        Chris Douglas added a comment -

        So one possible way is to let CodecPool do special for Gzip codec, and does either
        1) keeps a map for holding gzip codec of different settings.
        or
        2) treats the setting as a global setting, and when the setting is changed, clean all gzip codecs cached in CodecPool.

        Does the changes for CodecPool sound reasonable/acceptable?

        I'm not sure the "clean" semantics have clear triggers (or they're not clear to me). I'd suggest an analog to end in the (Dec|C)ompressor interface that reinitializes a (de)compressor, then use those interfaces in the CodecPool. This would be a better fix for HADOOP-5281, but it requires updates to other implementors of Compressor. Something like reinit that destroys (with end) and recreates (with init) the underlying stream. Overloading CodecPool::getCompressor to take a Configuration and... well, tracing the implications through the rest of the Codec classes makes it easy to trace where compressors are recycled. Calling reinit with parameters matching the current ones should be a noop and calling CodecPool::getCompressor without any arguments should use default params.

        Since this is a fair amount of work, if you wanted to narrow the issue to be global settings for GzipCodec, then an approach like that in the current patch is probably sufficient for many applications.

        Quick asides on the current patch: ZlibCompressor::construct should be final; if overridden in a subclass, the partially created object would call the subclass instance from the base cstr. Also, since the parameters are specific to GzipCodc, they should not have generic names like "io.compress.level".

        Show
        Chris Douglas added a comment - So one possible way is to let CodecPool do special for Gzip codec, and does either 1) keeps a map for holding gzip codec of different settings. or 2) treats the setting as a global setting, and when the setting is changed, clean all gzip codecs cached in CodecPool. Does the changes for CodecPool sound reasonable/acceptable? I'm not sure the "clean" semantics have clear triggers (or they're not clear to me). I'd suggest an analog to end in the (Dec|C)ompressor interface that reinitializes a (de)compressor, then use those interfaces in the CodecPool . This would be a better fix for HADOOP-5281 , but it requires updates to other implementors of Compressor . Something like reinit that destroys (with end ) and recreates (with init ) the underlying stream. Overloading CodecPool::getCompressor to take a Configuration and... well, tracing the implications through the rest of the Codec classes makes it easy to trace where compressors are recycled. Calling reinit with parameters matching the current ones should be a noop and calling CodecPool::getCompressor without any arguments should use default params. Since this is a fair amount of work, if you wanted to narrow the issue to be global settings for GzipCodec, then an approach like that in the current patch is probably sufficient for many applications. Quick asides on the current patch: ZlibCompressor::construct should be final; if overridden in a subclass, the partially created object would call the subclass instance from the base cstr. Also, since the parameters are specific to GzipCodc, they should not have generic names like "io.compress.level".
        Hide
        He Yongqiang added a comment -

        test failed test is not related this patch.
        However, what Chris commented is right. The current patch can not guarantee the Gzip Compressor got from CodecPool is of the settings what users expect. This is kind of global settings, but if we change settings but the CodecPool does not clean its buffered codecs which of old settings. So it may make things work in a wrong way.
        So one possible way is to let CodecPool do special for Gzip codec, and does either
        1) keeps a map for holding gzip codec of different settings.
        or
        2) treats the setting as a global setting, and when the setting is changed, clean all gzip codecs cached in CodecPool.

        Does the changes for CodecPool sound reasonable/acceptable?

        Show
        He Yongqiang added a comment - test failed test is not related this patch. However, what Chris commented is right. The current patch can not guarantee the Gzip Compressor got from CodecPool is of the settings what users expect. This is kind of global settings, but if we change settings but the CodecPool does not clean its buffered codecs which of old settings. So it may make things work in a wrong way. So one possible way is to let CodecPool do special for Gzip codec, and does either 1) keeps a map for holding gzip codec of different settings. or 2) treats the setting as a global setting, and when the setting is changed, clean all gzip codecs cached in CodecPool. Does the changes for CodecPool sound reasonable/acceptable?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12408699/hadoop-5879-5-21.patch
        against trunk revision 778921.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 4 new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/406/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/406/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/406/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/406/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408699/hadoop-5879-5-21.patch against trunk revision 778921. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/406/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/406/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/406/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/406/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Sorry, I didn't mean that it would fail or not change the setting. I meant that, since these parameters are specified in the constructor, there's no place where these parameters are reset when a compressor is pulled from the pool. HADOOP-5281 is an instance of a bug following this pattern. If the intent is to create all Gzip codecs with the same settings, then OK.

        Show
        Chris Douglas added a comment - Sorry, I didn't mean that it would fail or not change the setting. I meant that, since these parameters are specified in the constructor, there's no place where these parameters are reset when a compressor is pulled from the pool. HADOOP-5281 is an instance of a bug following this pattern. If the intent is to create all Gzip codecs with the same settings, then OK.
        Hide
        He Yongqiang added a comment - - edited

        I checked the code, it will work with SequenceFile, for example:

          public static void testSequenceFileWithGZipCodec() throws IOException{
            Configuration confWithParam = new Configuration();
            confWithParam.set("io.compress.level", "1");
            confWithParam.set("io.compress.strategy", "0");
            confWithParam.set("io.compress.buffer.size", Integer.toString(128 * 1024));
            FileSystem fs =FileSystem.get(confWithParam);
            GzipCodec codec= new GzipCodec();
            codec.setConf(confWithParam);// this line is not needed for creating SequenceFile.Writer
            SequenceFile.Writer writer =SequenceFile.createWriter(fs, confWithParam, new Path("/test/path"), 
               NullWritable.class, NullWritable.class, 
                CompressionType.BLOCK, codec);
            writer.close();
          }
        

        The call trace for getting a compressor is :
        CodecPool.getCompressor(CompressionCodec)->GzipCodec.createCompressor()->new GzipZlibCompressor(conf).
        I have not checked IFile, but i think it should work in the same way.

        Show
        He Yongqiang added a comment - - edited I checked the code, it will work with SequenceFile, for example: public static void testSequenceFileWithGZipCodec() throws IOException{ Configuration confWithParam = new Configuration(); confWithParam.set("io.compress.level", "1"); confWithParam.set("io.compress.strategy", "0"); confWithParam.set("io.compress.buffer.size", Integer.toString(128 * 1024)); FileSystem fs =FileSystem.get(confWithParam); GzipCodec codec= new GzipCodec(); codec.setConf(confWithParam);// this line is not needed for creating SequenceFile.Writer SequenceFile.Writer writer =SequenceFile.createWriter(fs, confWithParam, new Path("/test/path"), NullWritable.class, NullWritable.class, CompressionType.BLOCK, codec); writer.close(); } The call trace for getting a compressor is : CodecPool.getCompressor(CompressionCodec)- >GzipCodec.createCompressor() ->new GzipZlibCompressor(conf). I have not checked IFile, but i think it should work in the same way.
        Hide
        Chris Douglas added a comment -

        Both SequenceFile and IFile use o.a.h.io.compress.CodecPool to pool codecs. This is unlikely to work as expected, there.

        Show
        Chris Douglas added a comment - Both SequenceFile and IFile use o.a.h.io.compress.CodecPool to pool codecs. This is unlikely to work as expected, there.
        Hide
        He Yongqiang added a comment -

        A first try. Only tested with TestCodec testcase in my local linux box.

        Show
        He Yongqiang added a comment - A first try. Only tested with TestCodec testcase in my local linux box.

          People

          • Assignee:
            He Yongqiang
            Reporter:
            Zheng Shao
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development