Hadoop Common
  1. Hadoop Common
  2. HADOOP-6663

BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fix EOF exception in BlockDecompressorStream when decompressing previous compressed empty file
    • Tags:
      hadoop io compress BlockDecompressorStream

      Description

      An empty file can be compressed using BlockDecompressorStream, which is for block-based compressiong algorithm such as LZO. However, when decompressing the compressed file, BlockDecompressorStream get EOF exception.

      Here is a typical exception stack:

      java.io.EOFException
      at org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:125)
      at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:96)
      at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:82)
      at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74)
      at java.io.InputStream.read(InputStream.java:85)
      at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
      at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:134)
      at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:39)
      at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186)
      at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
      at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:18)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
      at org.apache.hadoop.mapred.Child.main(Child.java:196)

      1. BlockDecompressorStream.patch
        0.6 kB
        Kang Xiao
      2. BlockDecompressorStream.java.patch
        6 kB
        Kang Xiao
      3. BlockDecompressorStream.java.patch
        6 kB
        Kang Xiao
      4. HADOOP-6663.patch
        6 kB
        Tom White
      5. HADOOP-6663-0.20.2.patch
        0.6 kB
        Bennie Schut

        Activity

        Hide
        Kang Xiao added a comment -

        The EOF exception caused as follow:

        BlockCompressorStream compresses the input data block-by-block. For each block, the uncopressed block length is first written to the underlying output stream, followed by compressed chunks, each consists of a chuck-length and compressed chunk-data.

        So BlockCompressorStream writes an int of 0 to the underlying output stream when compressing empty file, without any following chunks.

        BlockDecompressorStream decompresses the underlying compressed input stream block by block. For each block, it reads the uncompressed block length and then reads the chunk length and compressed chunk.

        So BlockDecompressorStream read 0 and get EOF exception trying to read chunk length when decompressing previous compressed empty file.

        Show
        Kang Xiao added a comment - The EOF exception caused as follow: BlockCompressorStream compresses the input data block-by-block. For each block, the uncopressed block length is first written to the underlying output stream, followed by compressed chunks, each consists of a chuck-length and compressed chunk-data. So BlockCompressorStream writes an int of 0 to the underlying output stream when compressing empty file, without any following chunks. BlockDecompressorStream decompresses the underlying compressed input stream block by block. For each block, it reads the uncompressed block length and then reads the chunk length and compressed chunk. So BlockDecompressorStream read 0 and get EOF exception trying to read chunk length when decompressing previous compressed empty file.
        Hide
        Kang Xiao added a comment -

        Patch attached.

        Return -1 to indicate EOF when reading a 0 block length.

        Show
        Kang Xiao added a comment - Patch attached. Return -1 to indicate EOF when reading a 0 block length.
        Hide
        Kang Xiao added a comment -

        New patch attached, including test case.

        Show
        Kang Xiao added a comment - New patch attached, including test case.
        Hide
        Todd Lipcon added a comment -

        +1, I've seen this issue in production as well. Fix and test case look good, except please add the apache header to the test case, and preferably update the test case to JUnit 4 style

        Show
        Todd Lipcon added a comment - +1, I've seen this issue in production as well. Fix and test case look good, except please add the apache header to the test case, and preferably update the test case to JUnit 4 style
        Hide
        Kang Xiao added a comment -

        Thank you for your advice.

        New patch attached with apache header and JUnit 4 style test case.

        Show
        Kang Xiao added a comment - Thank you for your advice. New patch attached with apache header and JUnit 4 style test case.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12440828/BlockDecompressorStream.java.patch
        against trunk revision 930096.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/443/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/443/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/443/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/443/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440828/BlockDecompressorStream.java.patch against trunk revision 930096. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/443/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/443/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/443/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/443/console This message is automatically generated.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Kang Xiao!

        Show
        Tom White added a comment - I've just committed this. Thanks Kang Xiao!
        Hide
        Tom White added a comment -

        Re-opening as there was a compilation problem.

        Show
        Tom White added a comment - Re-opening as there was a compilation problem.
        Hide
        Tom White added a comment -

        Running updated patch through Hudson.

        Show
        Tom White added a comment - Running updated patch through Hudson.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #399 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/399/)
        Reverting HADOOP-6663.
        HADOOP-6663. BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file. Contributed by Kang Xiao.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #399 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/399/ ) Reverting HADOOP-6663 . HADOOP-6663 . BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file. Contributed by Kang Xiao.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #493 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/493/)
        Reverting HADOOP-6663.
        HADOOP-6663. BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file. Contributed by Kang Xiao.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #493 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/493/ ) Reverting HADOOP-6663 . HADOOP-6663 . BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file. Contributed by Kang Xiao.
        Hide
        Tom White added a comment -

        I ran the tests and test-patch manually:

             [exec] +1 overall. 
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 4 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
             [exec]
             [exec]     +1 system tests framework.  The patch passed system tests framework compile.
        
        Show
        Tom White added a comment - I ran the tests and test-patch manually: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system tests framework. The patch passed system tests framework compile.
        Hide
        Tom White added a comment -

        I've just committed this.

        Show
        Tom White added a comment - I've just committed this.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #404 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/404/)
        HADOOP-6663. BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file. Contributed by Kang Xiao.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #404 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/404/ ) HADOOP-6663 . BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file. Contributed by Kang Xiao.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #496 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/496/)
        HADOOP-6663. BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file. Contributed by Kang Xiao.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #496 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/496/ ) HADOOP-6663 . BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file. Contributed by Kang Xiao.
        Hide
        Bennie Schut added a comment -

        Copied the same change Kang Xiao made to 0.20.2 since I needed it for that version.

        Show
        Bennie Schut added a comment - Copied the same change Kang Xiao made to 0.20.2 since I needed it for that version.

          People

          • Assignee:
            Kang Xiao
            Reporter:
            Kang Xiao
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development