|
It seems the reason is that BZip2Codec did not split the Compressor and Stream.
One simple approach to make it work is to have a dummy Compressor and Decompressor class. createOutputStream(OutputStream, Compressor) getCompressorType() createCompressor() createInputStream(InputStream, Decompressor) getDecompressorType() createDecompressor() by using createOutputStream(OutputStream) createInputStream(InputStream) I saw this piece of code in TestCodec.java.
Unfortunately SequenceFileWriter.BlockCompressWriter is not calling close() on the deflateOut for each block. As a result, the codec is not working. //Necessary to close the stream for BZip2 Codec to write its final output. Flush is not enough.
deflateOut.close();
We will probably need to modify BZip2 Codec to make this work. Modified the BZip2Codec implementation a bit, and changed the way hadoop interact with the BZip2 implementation to make sure SequenceFile works BZip2Codec.
Added some new files (forgot to svn add last time)
[exec]
[exec] BUILD SUCCESSFUL [exec] Total time: 9 minutes 8 seconds [exec] Starting with /home/zshao/tmp/trunkFindbugsWarnings.xml [exec] Merging /home/zshao/tmp/patchFindbugsWarnings.xml [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Running Eclipse classpath verification. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] [exec] [exec] [exec] [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] BUILD SUCCESSFUL Zheng,
I went through all of your BZip2 related changes to my previous BZip2Codec code and they are looking fine to me. You just uploaded a new patch for 0.19 branch. Why we need different patches for different branches? Are you guys saying that this patch needs to go into 0.19 as well? But this is a new feature, isn't it?
Log for 0.19 test:
[exec] BUILD SUCCESSFUL @Abdul, the only difference between 0.19 patch and 0.20 patch (trunk patchis the same as 0.20 patch) is a line offset difference. Basically it's essentially the same patch.
@Drhuba, the BZip2Codec does not work with SequenceFile right now in 0.19. I consider that to be a bug. What do you think? Log for 0.20 test:
[exec] BUILD SUCCESSFUL > BZip2Codec does not work with SequenceFile right now in 0.19. I consider that to be a bug.
The standard criteria is whether it is a regression. Did it work in a release prior to 0.19? If so, then it's a regression and should be fixed in 0.19. If not then it's a new feature, and should be added in 0.20. However sometimes, as an exception, we permit fixes to all-new code, e.g., a new contrib module, that are not regressions, if they have zero chance of causing a regression anywhere else. This patch touches only files that were added in 0.19, and those files were themselves an independent addition (http://svn.apache.org/viewvc?view=rev&revision=680802 Ok, thanks. I will commit this to 0.19, 0.20 and trunk.
I just committed this. Thanks Zheng!
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is used to test the time and space efficiency of bzip2 with SequenceFile.