Chris, thanks for the review.
* Different properties for the output key/value classes aren't necessary; you can use the existing methods, like JobConf::getOutputKeyClass.
Reason I did it this way is, I want to use this ouputformat with C++Pipes.
(1) c++-reducer ---> (2) Java-PipesReducer ---> (3) collector ---> (4) SequenceFile(AsBinary)...
And (2) mapred/pipes/PipesReducer calls job.getOutputKeyClass() and job.getOutputValueClass(),
but I want those outputs to be BytesWritable and not the key/value classes of the SequenceFile.
How about this. Just like mapoutputkeyclass uses outputkeyclass as the default class, we'll use
outputkeyclass if SequenceFileOutputKeyClass is not being defined in the config.
* The generic signature on the RecordWriter can be <BytesWritable,BytesWritable> if the signature on SeqFileOF were correct:
Done. Modified SequenceFile.java. Added @SuppressWarnings("unchecked") for MultipleSequenceFileOutputFormat.getBaseRecordWriter.
* Since record compression is not supported, it might be worthwhile to override OutputFormat::checkOutputSpecs and throw if it's attempted
Done. Test added.
* This should be in o.a.h.mapred.lib rather than o.a.h.mapred
Yes. Except that SequenceFileAsBinaryInputFormat is in o.a.h.mapred.
For now, I'll leave this to o.a.h.mapred and we can create a new Jira to move both of them to o.a.h.mapred.lib.
* Keeping a WritableValueBytes instance around (and adding a reset method) might be useful, so a new one isn't created for each write.
Done. (Not sure if I did it correctly.)
* The IllegalArgumentException in WritableValueBytes should probably be an UnsupportedOperationException
* WritableValueBytes should be a static inner class
* The indentation on the anonymous RecordWriter::close should be consistent with the standards