krishnakumar has commented on the revision "HIVE-2604 [jira] Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies".
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java:33 This, itself, is an implementation of the ComressionCodec interface. The only important part of the class are the createInputStream/createOutputStream methods. The dummyCompressor is needed for conforming to the interface.
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:70 Will add comments.
The method is called readFromCompressor as it is reading from the inputreader created off a type-specific compressor. I can rename it to readFromInputReader?
If you mean the copying annotated by the FIXME, yes, it can be avoided by having an outputstream on an existing buffer. Did not find a readymade class for that, I will create one.
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:101 This is the second case (in the jira description) where the user specifies a custom serde+codec to be used for compressing a specific column. So we need to deserialize and reserialize here.
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java:38 I needed a simple read/write on outputstream. WritableUtils implements a more complicated mechanism which prefers smaller values.
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java:1 data structures and algorithms!
contrib/src/test/queries/clientpositive/ubercompressor.q:4 The configs are modelled on existing config for compression, so I guess that means that all output tables will be compressed using the same config?
The codec and its child classes do not have access to table/partition, right? How would we populate the metastore from codec implementation classes?