Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
It appears that AvroSequenceFile is not passing compression type/codec info down to the SequenceFile.Writer. This is because AvroSequenceFile.Writer is making a direct call to SequenceFile.Writer's public constructor rather than using one of the SequenceFile createWriter factory methods
Here is a bit of workaround code that I came up with
AvroSequenceFile.Writer.Options options = new AvroSequenceFile.Writer.Options() .withConfiguration(hdfsInfo.getConf()) .withFileSystem(hdfsInfo.getFileSystem()) .withOutputPath(hdfsInfo.getPath()) .withCompressionType(configuration.getCompressionType()) .withCompressionCodec(configuration.getCompressionCodec().getCodec()) .withProgressable(new Progressable() { @Override public void progress(){ } }) .withKeySchema(configuration.getKeySchema()) .withValueSchema(configuration.getValueSchema()); // Have to do this here b/c it's hidden in a private method :( Metadata metadata = options.getMetadata(); if (null != configuration.getKeySchema()) { metadata.set(AvroSequenceFile.METADATA_FIELD_KEY_SCHEMA, new Text(configuration.getKeySchema().toString())); } if (null != configuration.getValueSchema()) { metadata.set(AvroSequenceFile.METADATA_FIELD_VALUE_SCHEMA, new Text(configuration.getValueSchema().toString())); } return SequenceFile.createWriter( options.getFileSystem(), options.getConfigurationWithAvroSerialization(), options.getOutputPath(), options.getKeyClass(), options.getValueClass(), options.getBufferSizeBytes(), options.getReplicationFactor(), options.getBlockSizeBytes(), options.getCompressionType(), options.getCompressionCodec(), options.getProgressable(), metadata);
I used this code to write a BZIP2 block compressed sequence file, and was able to read it using the Avro mapreduce classes just fine.