Hadoop Common
  1. Hadoop Common
  2. HADOOP-6315

GzipCodec should not represent BuiltInZlibInflater as decompressorType

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.2
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      It is possible to pollute CodecPool in such a way that Hadoop cannot read gzip-compressed data.

      1. HADOOP-6315.patch
        6 kB
        Aaron Kimball
      2. HADOOP-6315.3.patch
        9 kB
        Aaron Kimball
      3. HADOOP-6315.2.patch
        6 kB
        Aaron Kimball

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #217 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/217/)
        . Avoid incorrect use of BuiltInflater/BuiltInDeflater in
        GzipCodec. Contributed by Aaron Kimball

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #217 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/217/ ) . Avoid incorrect use of BuiltInflater/BuiltInDeflater in GzipCodec. Contributed by Aaron Kimball
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #139 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/139/)
        . Avoid incorrect use of BuiltInflater/BuiltInDeflater in
        GzipCodec. Contributed by Aaron Kimball

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #139 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/139/ ) . Avoid incorrect use of BuiltInflater/BuiltInDeflater in GzipCodec. Contributed by Aaron Kimball
        Hide
        Chris Douglas added a comment -

        I committed this. Thanks, Aaron!

        Show
        Chris Douglas added a comment - I committed this. Thanks, Aaron!
        Hide
        Chris Douglas added a comment -

        HADOOP-5281 carried over the semantics from the existing code, i.e. returned BuiltInZlibInflater when the native libs aren't loaded:

        -    return ZlibFactory.getZlibDecompressorType(conf);
        +    return ZlibFactory.isNativeZlibLoaded(conf)
        +      ? GzipZlibDecompressor.class
        +      : BuiltInZlibInflater.class;
        

        which made the fix incomplete. Unfortunately, extending HADOOP-5281 to support the standard platform (i.e. writing the corresponding BuiltInGzipInflater/Deflater passing nowrap == true) isn't possible either, since the GZIP*Streams don't expose constructors taking Inflaters/Deflaters (so it's not possible to write a .gz file with settings available on a Deflater). Given that, the current patch makes it impossible to reuse them, which is regrettably correct.

        +1

        Show
        Chris Douglas added a comment - HADOOP-5281 carried over the semantics from the existing code, i.e. returned BuiltInZlibInflater when the native libs aren't loaded: - return ZlibFactory.getZlibDecompressorType(conf); + return ZlibFactory.isNativeZlibLoaded(conf) + ? GzipZlibDecompressor.class + : BuiltInZlibInflater.class; which made the fix incomplete. Unfortunately, extending HADOOP-5281 to support the standard platform (i.e. writing the corresponding BuiltInGzipInflater/Deflater passing nowrap == true) isn't possible either, since the GZIP*Streams don't expose constructors taking Inflaters/Deflaters (so it's not possible to write a .gz file with settings available on a Deflater). Given that, the current patch makes it impossible to reuse them, which is regrettably correct. +1
        Hide
        Aaron Kimball added a comment -

        Looking at the 0.20 source, it appears that this bug will manifest there as well. I think this should be applied to 0.20, 0.21, and trunk.

        Show
        Aaron Kimball added a comment - Looking at the 0.20 source, it appears that this bug will manifest there as well. I think this should be applied to 0.20, 0.21, and trunk.
        Hide
        Tom White added a comment -

        It would be good to have a contributor from HADOOP-5281 comment on this patch, since HADOOP-5281 was marked as a blocker for 0.20. Does the current fix need to go into 0.20 too?

        Show
        Tom White added a comment - It would be good to have a contributor from HADOOP-5281 comment on this patch, since HADOOP-5281 was marked as a blocker for 0.20. Does the current fix need to go into 0.20 too?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428103/HADOOP-6315.3.patch
        against trunk revision 890964.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/29/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/29/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/29/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/29/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428103/HADOOP-6315.3.patch against trunk revision 890964. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/29/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/29/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/29/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/29/console This message is automatically generated.
        Hide
        Aaron Kimball added a comment -

        Yes, I think that patch is what caused the issue. I have added a further testcase for the Compressor analogue; the same problem manifests there, so I have also made the corresponding change to GzipCodec.java.

        Show
        Aaron Kimball added a comment - Yes, I think that patch is what caused the issue. I have added a further testcase for the Compressor analogue; the same problem manifests there, so I have also made the corresponding change to GzipCodec.java.
        Hide
        Tom White added a comment -

        Does getCompressorType() need to change too? Is this related to the change that went in to HADOOP-5281?

        Show
        Tom White added a comment - Does getCompressorType() need to change too? Is this related to the change that went in to HADOOP-5281 ?
        Hide
        Aaron Kimball added a comment -

        Some of my Sqoop test cases are broken without this fix. Can we apply this to trunk yet? Chris?

        Show
        Aaron Kimball added a comment - Some of my Sqoop test cases are broken without this fix. Can we apply this to trunk yet? Chris?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422309/HADOOP-6315.2.patch
        against trunk revision 824942.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/89/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/89/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/89/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/89/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422309/HADOOP-6315.2.patch against trunk revision 824942. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/89/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/89/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/89/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/89/console This message is automatically generated.
        Hide
        Aaron Kimball added a comment -

        New patch without 'extends TestCase'

        Show
        Aaron Kimball added a comment - New patch without 'extends TestCase'
        Hide
        Chris Douglas added a comment -

        Yes. In our JUnit3 tests, we extend TestCase and follow naming conventions for those methods. JUnit4 works with annotations. We've been gradually converting the unit tests to use the latter.

        Show
        Chris Douglas added a comment - Yes. In our JUnit3 tests, we extend TestCase and follow naming conventions for those methods. JUnit4 works with annotations. We've been gradually converting the unit tests to use the latter.
        Hide
        Aaron Kimball added a comment -

        Chris: Virtually every other Test*.java class "extends TestCase" (80 out of 101 Test*.java classes in src/test/core/). I just assumed that the lack of TestCase was an error and figured I'd tidy up the oversight. Should I revert that line?

        Show
        Aaron Kimball added a comment - Chris: Virtually every other Test*.java class "extends TestCase" (80 out of 101 Test*.java classes in src/test/core/). I just assumed that the lack of TestCase was an error and figured I'd tidy up the oversight. Should I revert that line?
        Hide
        Todd Lipcon added a comment -

        +1 looks good to me.

        I looked over the patch and test cases, then applied it and made sure it passed. I then reverted the fix and verified that the new test cases failed.

        Show
        Todd Lipcon added a comment - +1 looks good to me. I looked over the patch and test cases, then applied it and made sure it passed. I then reverted the fix and verified that the new test cases failed.
        Hide
        Chris Douglas added a comment -

        Why does TestCodec extend TestCase, reverting to JUnit3 semantics, when this also employs the JUnit4 annotations?

        Show
        Chris Douglas added a comment - Why does TestCodec extend TestCase, reverting to JUnit3 semantics, when this also employs the JUnit4 annotations?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422281/HADOOP-6315.patch
        against trunk revision 824942.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/88/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/88/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/88/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/88/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422281/HADOOP-6315.patch against trunk revision 824942. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/88/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/88/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/88/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/88/console This message is automatically generated.
        Hide
        Aaron Kimball added a comment -

        GzipCodec does not use BuiltInZlibInflater as its decompressor; createDecompressor() returns null. But getDecompressorType() returns BuiltInZlibInflater as its inflater type, which causes CodecPool to use BuiltInZlibInflater for GzipCodec.getDecompressor(). This inflater does not work on .gz files.

        This patch fixes the bug. Also includes two test cases that demonstrate the problem.

        Show
        Aaron Kimball added a comment - GzipCodec does not use BuiltInZlibInflater as its decompressor; createDecompressor() returns null. But getDecompressorType() returns BuiltInZlibInflater as its inflater type, which causes CodecPool to use BuiltInZlibInflater for GzipCodec.getDecompressor(). This inflater does not work on .gz files. This patch fixes the bug. Also includes two test cases that demonstrate the problem.

          People

          • Assignee:
            Aaron Kimball
            Reporter:
            Aaron Kimball
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development