Hadoop Common
  1. Hadoop Common
  2. HADOOP-8423

MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.2
    • Fix Version/s: 1.2.0, 2.0.2-alpha
    • Component/s: io
    • Labels:
      None
    • Environment:

      Linux 2.6.32.23-0.3-default #1 SMP 2010-10-07 14:57:45 +0200 x86_64 x86_64 x86_64 GNU/Linux

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      I am using Cloudera distribution cdh3u1.

      When trying to check native codecs for better decompression
      performance such as Snappy or LZO, I ran into issues with random
      access using MapFile.Reader.get(key, value) method.
      First call of MapFile.Reader.get() works but a second call fails.

      Also I am getting different exceptions depending on number of entries
      in a map file.
      With LzoCodec and 10 record file, jvm gets aborted.

      At the same time the DefaultCodec works fine for all cases, as well as
      record compression for the native codecs.

      I created a simple test program (attached) that creates map files
      locally with sizes of 10 and 100 records for three codecs: Default,
      Snappy, and LZO.
      (The test requires corresponding native library available)

      The summary of problems are given below:

      Map Size: 100
      Compression: RECORD
      ==================
      DefaultCodec: OK
      SnappyCodec: OK
      LzoCodec: OK

      Map Size: 10
      Compression: RECORD
      ==================
      DefaultCodec: OK
      SnappyCodec: OK
      LzoCodec: OK

      Map Size: 100
      Compression: BLOCK
      ================
      DefaultCodec: OK

      SnappyCodec: java.io.EOFException at
      org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114)

      LzoCodec: java.io.EOFException at
      org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114)

      Map Size: 10
      Compression: BLOCK
      ==================
      DefaultCodec: OK

      SnappyCodec: java.lang.NoClassDefFoundError: Ljava/lang/InternalError
      at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
      Method)

      LzoCodec:
      #

      1. A fatal error has been detected by the Java Runtime Environment:
        #
      2. SIGSEGV (0xb) at pc=0x00002b068ffcbc00, pid=6385, tid=47304763508496
        #
      3. JRE version: 6.0_21-b07
      4. Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b17 mixed mode linux-amd64 )
      5. Problematic frame:
      6. C [liblzo2.so.2+0x13c00] lzo1x_decompress+0x1a0
        #
      1. MapFileCodecTest.java
        4 kB
        Jason B
      2. HADOOP-8423-branch-1.patch
        3 kB
        Harsh J
      3. HADOOP-8423-branch-1.patch
        3 kB
        Harsh J
      4. hadoop-8423.txt
        4 kB
        Todd Lipcon

        Activity

        Hide
        Jason B added a comment -

        Test program

        Show
        Jason B added a comment - Test program
        Hide
        Todd Lipcon added a comment -

        I'll take a look at this soon, thanks Jason.

        Show
        Todd Lipcon added a comment - I'll take a look at this soon, thanks Jason.
        Hide
        Todd Lipcon added a comment -

        Attached patch should fix the issue. I only tested with Snappy, not LZO, so please let me know if LZO doesn't work.

        The issue was that BlockDecompressorStream wasn't resetting its own state when resetState() was called. So, when reseeking in the SequenceFile.Reader, it would get "out of sync" - and be at the beginning of a block but think it was in the middle of a block. So, the codec got invalid data fed to it.

        Show
        Todd Lipcon added a comment - Attached patch should fix the issue. I only tested with Snappy, not LZO, so please let me know if LZO doesn't work. The issue was that BlockDecompressorStream wasn't resetting its own state when resetState() was called. So, when reseeking in the SequenceFile.Reader, it would get "out of sync" - and be at the beginning of a block but think it was in the middle of a block. So, the codec got invalid data fed to it.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12531144/hadoop-8423.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1089//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1089//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531144/hadoop-8423.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1089//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1089//console This message is automatically generated.
        Hide
        Harsh J added a comment -

        The issue was that BlockDecompressorStream wasn't resetting its own state when resetState() was called. So, when reseeking in the SequenceFile.Reader, it would get "out of sync" - and be at the beginning of a block but think it was in the middle of a block. So, the codec got invalid data fed to it.

        +1, applied patch minus the fix (i.e. just test) and it fails, passes with the fix.

        Committing shortly.

        Show
        Harsh J added a comment - The issue was that BlockDecompressorStream wasn't resetting its own state when resetState() was called. So, when reseeking in the SequenceFile.Reader, it would get "out of sync" - and be at the beginning of a block but think it was in the middle of a block. So, the codec got invalid data fed to it. +1, applied patch minus the fix (i.e. just test) and it fails, passes with the fix. Committing shortly.
        Hide
        Harsh J added a comment -

        Committed to branch-2 and trunk, thanks Todd! Leaving open for branch-1 since it was mentioned in the Target Versions.

        Todd - Will you be doing the branch-1 version or want me to handle that?

        Show
        Harsh J added a comment - Committed to branch-2 and trunk, thanks Todd! Leaving open for branch-1 since it was mentioned in the Target Versions. Todd - Will you be doing the branch-1 version or want me to handle that?
        Hide
        Todd Lipcon added a comment -

        If you have time, feel free. Otherwise I'll try to get to it soon.

        Show
        Todd Lipcon added a comment - If you have time, feel free. Otherwise I'll try to get to it soon.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2508 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2508/)
        HADOOP-8423. MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (harsh) (Revision 1359866)

        Result = SUCCESS
        harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359866
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2508 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2508/ ) HADOOP-8423 . MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (harsh) (Revision 1359866) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359866 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2441 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2441/)
        HADOOP-8423. MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (harsh) (Revision 1359866)

        Result = SUCCESS
        harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359866
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2441 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2441/ ) HADOOP-8423 . MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (harsh) (Revision 1359866) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359866 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Hide
        Harsh J added a comment -

        Todd,

        Attached is the patch for branch-1.

        test-patch results:

        
             [exec] -1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     -1 findbugs.  The patch appears to introduce 218 new Findbugs (version 2.0.1-rc3) warnings.
        

        (The 218 findbugs are consistent from my other runs, such as MAPREDUCE-4414)

        If this looks OK to you, I'll add it in.

        The only major difference I had to make was to rely on an if-else clause instead of an assume clause, as branch-1 TestCodec is a JUnit3 test.

        Thanks!

        Show
        Harsh J added a comment - Todd, Attached is the patch for branch-1. test-patch results: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 218 new Findbugs (version 2.0.1-rc3) warnings. (The 218 findbugs are consistent from my other runs, such as MAPREDUCE-4414 ) If this looks OK to you, I'll add it in. The only major difference I had to make was to rely on an if-else clause instead of an assume clause, as branch-1 TestCodec is a JUnit3 test. Thanks!
        Hide
        Todd Lipcon added a comment -

        Instead of failing if Snappy isn't available, the test should pass - the idea is that not all build envs have snappy, so we only want to test it when it's available. Perhaps change the fail to a println.

        Show
        Todd Lipcon added a comment - Instead of failing if Snappy isn't available, the test should pass - the idea is that not all build envs have snappy, so we only want to test it when it's available. Perhaps change the fail to a println.
        Hide
        Harsh J added a comment -

        Did think I shouldn't have failed, but was unclear if println would provide any visibility. Done, switched to stderr println.

        Show
        Harsh J added a comment - Did think I shouldn't have failed, but was unclear if println would provide any visibility. Done, switched to stderr println.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12535932/HADOOP-8423-branch-1.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1187//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535932/HADOOP-8423-branch-1.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1187//console This message is automatically generated.
        Hide
        Harsh J added a comment -

        Canceling patch as remaining work is for branch-1.

        Todd, does the above new patch look good to you?

        Show
        Harsh J added a comment - Canceling patch as remaining work is for branch-1. Todd, does the above new patch look good to you?
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1100 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1100/)
        HADOOP-8423. MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (harsh) (Revision 1359866)

        Result = FAILURE
        harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359866
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1100 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1100/ ) HADOOP-8423 . MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (harsh) (Revision 1359866) Result = FAILURE harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359866 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1133 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1133/)
        HADOOP-8423. MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (harsh) (Revision 1359866)

        Result = SUCCESS
        harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359866
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1133 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1133/ ) HADOOP-8423 . MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (harsh) (Revision 1359866) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359866 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Hide
        Todd Lipcon added a comment -

        Yep, +1. Thanks for backporting, Harsh!

        Show
        Todd Lipcon added a comment - Yep, +1. Thanks for backporting, Harsh!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #311 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/311/)
        svn merge -c 1359866 FIXES: HADOOP-8423. MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (Revision 1360264)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1360264
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java
        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #311 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/311/ ) svn merge -c 1359866 FIXES: HADOOP-8423 . MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data. Contributed by Todd Lipcon. (Revision 1360264) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1360264 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BlockDecompressorStream.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
        Hide
        Todd Lipcon added a comment -

        Committed to branch-1 for 1.2. Thanks for backporting, Harsh.

        Show
        Todd Lipcon added a comment - Committed to branch-1 for 1.2. Thanks for backporting, Harsh.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Jason B
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development