Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0, 0.20.0, 0.21.0
    • Fix Version/s: 0.19.1
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Somehow bzip2 does not work with SequenceFile:

          String codec = "org.apache.hadoop.io.compress.BZip2Codec";
          SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, new Path(output), 
              reader.getKeyClass(), reader.getValueClass(), CompressionType.BLOCK, 
              (CompressionCodec)Class.forName(codec).newInstance());
      

      The stack trace is here:

      java.lang.UnsupportedOperationException
              at org.apache.hadoop.io.compress.BZip2Codec.getCompressorType(BZip2Codec.java:80)
              at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:98)
              at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:914)
              at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.<init>(SequenceFile.java:1198)
              at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:401)
              at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:329)
              at org.apache.hadoop.mapred.TestSequenceFileBZip.main(TestSequenceFileBZip.java:43)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
              at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
              at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
      
      1. HADOOP-4918.3.patch
        15 kB
        Zheng Shao
      2. HADOOP-4918.3.0.19.patch
        15 kB
        Zheng Shao
      3. HADOOP-4918.3.0.20.patch
        15 kB
        Zheng Shao

        Issue Links

          Activity

          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Zheng Shao made changes -
          Link This issue is blocked by HADOOP-5213 [ HADOOP-5213 ]
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.20.0 [ 12313438 ]
          Fix Version/s 0.21.0 [ 12313563 ]
          Resolution Fixed [ 1 ]
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Zheng!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Zheng!
          Hide
          dhruba borthakur added a comment -

          Ok, thanks. I will commit this to 0.19, 0.20 and trunk.

          Show
          dhruba borthakur added a comment - Ok, thanks. I will commit this to 0.19, 0.20 and trunk.
          Hide
          Doug Cutting added a comment -

          > BZip2Codec does not work with SequenceFile right now in 0.19. I consider that to be a bug.

          The standard criteria is whether it is a regression. Did it work in a release prior to 0.19? If so, then it's a regression and should be fixed in 0.19. If not then it's a new feature, and should be added in 0.20.

          However sometimes, as an exception, we permit fixes to all-new code, e.g., a new contrib module, that are not regressions, if they have zero chance of causing a regression anywhere else. This patch touches only files that were added in 0.19, and those files were themselves an independent addition (http://svn.apache.org/viewvc?view=rev&revision=680802), so I see no possibility for this creating any regressions in 0.19 and would not oppose treating it as such an exception.

          Show
          Doug Cutting added a comment - > BZip2Codec does not work with SequenceFile right now in 0.19. I consider that to be a bug. The standard criteria is whether it is a regression. Did it work in a release prior to 0.19? If so, then it's a regression and should be fixed in 0.19. If not then it's a new feature, and should be added in 0.20. However sometimes, as an exception, we permit fixes to all-new code, e.g., a new contrib module, that are not regressions, if they have zero chance of causing a regression anywhere else. This patch touches only files that were added in 0.19, and those files were themselves an independent addition ( http://svn.apache.org/viewvc?view=rev&revision=680802 ), so I see no possibility for this creating any regressions in 0.19 and would not oppose treating it as such an exception.
          Hide
          Zheng Shao added a comment -

          Log for 0.20 test:

          [exec]
          [exec] BUILD SUCCESSFUL
          [exec] Total time: 3 minutes 56 seconds
          [exec] Starting with /home/zshao/tmp/trunkFindbugsWarnings.xml
          [exec] Merging /home/zshao/tmp/patchFindbugsWarnings.xml
          [exec]
          [exec]
          [exec] ======================================================================
          [exec] ======================================================================
          [exec] Running Eclipse classpath verification.
          [exec] ======================================================================
          [exec] ======================================================================
          [exec]
          [exec]
          [exec]
          [exec]
          [exec]
          [exec]
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec]
          [exec]
          [exec]
          [exec] ======================================================================
          [exec] ======================================================================
          [exec] Finished build.
          [exec] ======================================================================
          [exec] ======================================================================
          [exec]
          [exec]

          BUILD SUCCESSFUL
          Total time: 14 minutes 57 seconds

          Show
          Zheng Shao added a comment - Log for 0.20 test: [exec] [exec] BUILD SUCCESSFUL [exec] Total time: 3 minutes 56 seconds [exec] Starting with /home/zshao/tmp/trunkFindbugsWarnings.xml [exec] Merging /home/zshao/tmp/patchFindbugsWarnings.xml [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Running Eclipse classpath verification. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] [exec] [exec] [exec] [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] BUILD SUCCESSFUL Total time: 14 minutes 57 seconds
          Zheng Shao made changes -
          Affects Version/s 0.20.0 [ 12313438 ]
          Summary Make bzip2 work with SequenceFile Fix bzip2 work with SequenceFile
          Issue Type Improvement [ 4 ] Bug [ 1 ]
          Affects Version/s 0.19.0 [ 12313211 ]
          Affects Version/s 0.21.0 [ 12313563 ]
          Component/s io [ 12310687 ]
          Component/s mapred [ 12310690 ]
          Fix Version/s 0.20.0 [ 12313438 ]
          Fix Version/s 0.19.1 [ 12313473 ]
          Hide
          Zheng Shao added a comment -

          @Abdul, the only difference between 0.19 patch and 0.20 patch (trunk patchis the same as 0.20 patch) is a line offset difference. Basically it's essentially the same patch.

          @Drhuba, the BZip2Codec does not work with SequenceFile right now in 0.19. I consider that to be a bug. What do you think?

          Show
          Zheng Shao added a comment - @Abdul, the only difference between 0.19 patch and 0.20 patch (trunk patchis the same as 0.20 patch) is a line offset difference. Basically it's essentially the same patch. @Drhuba, the BZip2Codec does not work with SequenceFile right now in 0.19. I consider that to be a bug. What do you think?
          Hide
          Zheng Shao added a comment -

          Log for 0.19 test:

          [exec] BUILD SUCCESSFUL
          [exec] Total time: 3 minutes 19 seconds
          [exec] Starting with /home/zshao/tmp/trunkFindbugsWarnings.xml
          [exec] Merging /home/zshao/tmp/patchFindbugsWarnings.xml
          [exec]
          [exec]
          [exec]
          [exec]
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec]
          [exec]
          [exec]
          [exec] ======================================================================
          [exec] ======================================================================
          [exec] Finished build.
          [exec] ======================================================================
          [exec] ======================================================================
          [exec]
          [exec]

          Show
          Zheng Shao added a comment - Log for 0.19 test: [exec] BUILD SUCCESSFUL [exec] Total time: 3 minutes 19 seconds [exec] Starting with /home/zshao/tmp/trunkFindbugsWarnings.xml [exec] Merging /home/zshao/tmp/patchFindbugsWarnings.xml [exec] [exec] [exec] [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec]
          Zheng Shao made changes -
          Attachment HADOOP-4918.3.0.20.patch [ 12397942 ]
          Hide
          Zheng Shao added a comment -

          Patch for 0.20.

          Show
          Zheng Shao added a comment - Patch for 0.20.
          Hide
          dhruba borthakur added a comment -

          Are you guys saying that this patch needs to go into 0.19 as well? But this is a new feature, isn't it?

          Show
          dhruba borthakur added a comment - Are you guys saying that this patch needs to go into 0.19 as well? But this is a new feature, isn't it?
          Hide
          Abdul Qadeer added a comment -

          Zheng,

          I went through all of your BZip2 related changes to my previous BZip2Codec code and they are looking fine to me.
          so a +1 as far as BZip2 related stuff is concerned.

          You just uploaded a new patch for 0.19 branch. Why we need different patches for different branches?

          Show
          Abdul Qadeer added a comment - Zheng, I went through all of your BZip2 related changes to my previous BZip2Codec code and they are looking fine to me. so a +1 as far as BZip2 related stuff is concerned. You just uploaded a new patch for 0.19 branch. Why we need different patches for different branches?
          Zheng Shao made changes -
          Attachment HADOOP-4918.3.0.19.patch [ 12397940 ]
          Hide
          Zheng Shao added a comment -

          Patch for branch 0.19.

          Show
          Zheng Shao added a comment - Patch for branch 0.19.
          Hide
          Zheng Shao added a comment -

          [exec]
          [exec] BUILD SUCCESSFUL
          [exec] Total time: 9 minutes 8 seconds
          [exec] Starting with /home/zshao/tmp/trunkFindbugsWarnings.xml
          [exec] Merging /home/zshao/tmp/patchFindbugsWarnings.xml
          [exec]
          [exec]
          [exec] ======================================================================
          [exec] ======================================================================
          [exec] Running Eclipse classpath verification.
          [exec] ======================================================================
          [exec] ======================================================================
          [exec]
          [exec]
          [exec]
          [exec]
          [exec]
          [exec]
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec]
          [exec]
          [exec]
          [exec] ======================================================================
          [exec] ======================================================================
          [exec] Finished build.
          [exec] ======================================================================
          [exec] ======================================================================
          [exec]
          [exec]

          BUILD SUCCESSFUL
          Total time: 30 minutes 55 seconds

          Show
          Zheng Shao added a comment - [exec] [exec] BUILD SUCCESSFUL [exec] Total time: 9 minutes 8 seconds [exec] Starting with /home/zshao/tmp/trunkFindbugsWarnings.xml [exec] Merging /home/zshao/tmp/patchFindbugsWarnings.xml [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Running Eclipse classpath verification. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] [exec] [exec] [exec] [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] BUILD SUCCESSFUL Total time: 30 minutes 55 seconds
          Zheng Shao made changes -
          Attachment HADOOP-4918.2.patch [ 12397869 ]
          Zheng Shao made changes -
          Attachment HADOOP-4918.3.patch [ 12397921 ]
          Hide
          Zheng Shao added a comment -

          Added some new files (forgot to svn add last time)

          Show
          Zheng Shao added a comment - Added some new files (forgot to svn add last time)
          Zheng Shao made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Zheng Shao made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Zheng Shao made changes -
          Assignee Zheng Shao [ zshao ]
          Zheng Shao made changes -
          Attachment TestSequenceFileBZip.java [ 12396444 ]
          Zheng Shao made changes -
          Attachment HADOOP-4918.1.patch [ 12397767 ]
          Zheng Shao made changes -
          Attachment HADOOP-4918.2.patch [ 12397869 ]
          Hide
          Zheng Shao added a comment -

          Added a test case.

          Show
          Zheng Shao added a comment - Added a test case.
          Zheng Shao made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.21.0 [ 12313563 ]
          Zheng Shao made changes -
          Attachment HADOOP-4918.1.patch [ 12397767 ]
          Hide
          Zheng Shao added a comment -

          Modified the BZip2Codec implementation a bit, and changed the way hadoop interact with the BZip2 implementation to make sure SequenceFile works BZip2Codec.

          Show
          Zheng Shao added a comment - Modified the BZip2Codec implementation a bit, and changed the way hadoop interact with the BZip2 implementation to make sure SequenceFile works BZip2Codec.
          Hide
          Zheng Shao added a comment -

          I saw this piece of code in TestCodec.java.

          Unfortunately SequenceFileWriter.BlockCompressWriter is not calling close() on the deflateOut for each block. As a result, the codec is not working.

              //Necessary to close the stream for BZip2 Codec to write its final output.  Flush is not enough.
              deflateOut.close();
          

          We will probably need to modify BZip2 Codec to make this work.

          Show
          Zheng Shao added a comment - I saw this piece of code in TestCodec.java. Unfortunately SequenceFileWriter.BlockCompressWriter is not calling close() on the deflateOut for each block. As a result, the codec is not working. //Necessary to close the stream for BZip2 Codec to write its final output. Flush is not enough. deflateOut.close(); We will probably need to modify BZip2 Codec to make this work.
          Hide
          Zheng Shao added a comment -

          It seems the reason is that BZip2Codec did not split the Compressor and Stream.

          One simple approach to make it work is to have a dummy Compressor and Decompressor class.
          So the following 6 functions can be implemented using the dummy classes:

          createOutputStream(OutputStream, Compressor)
          getCompressorType()
          createCompressor()
          createInputStream(InputStream, Decompressor)
          getDecompressorType()
          createDecompressor()
          

          by using

          createOutputStream(OutputStream)
          createInputStream(InputStream)
          
          Show
          Zheng Shao added a comment - It seems the reason is that BZip2Codec did not split the Compressor and Stream. One simple approach to make it work is to have a dummy Compressor and Decompressor class. So the following 6 functions can be implemented using the dummy classes: createOutputStream(OutputStream, Compressor) getCompressorType() createCompressor() createInputStream(InputStream, Decompressor) getDecompressorType() createDecompressor() by using createOutputStream(OutputStream) createInputStream(InputStream)
          Zheng Shao made changes -
          Link This issue relates to HADOOP-3646 [ HADOOP-3646 ]
          Zheng Shao made changes -
          Field Original Value New Value
          Attachment TestSequenceFileBZip.java [ 12396444 ]
          Hide
          Zheng Shao added a comment -

          A java class that reads a SequenceFile and write it out with specified codec and block size.

          This is used to test the time and space efficiency of bzip2 with SequenceFile.

          Show
          Zheng Shao added a comment - A java class that reads a SequenceFile and write it out with specified codec and block size. This is used to test the time and space efficiency of bzip2 with SequenceFile.
          Zheng Shao created issue -

            People

            • Assignee:
              Zheng Shao
              Reporter:
              Zheng Shao
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development