Attaching a patch that provides AvroInputFormat/AvroOutputFormat.
AvroInputFormat allows you to set its input schema in the job configuration. It provides static methods for this functionality. Depending on the input serialization metadata it can choose to deserialize to generic, reflect, or specific-based classes.
This patch includes unit tests for both of these classes.
I have also extended the jobdata API to allow you to set output serialization metadata (vs. simple class-name-only metadata) in the same fashion as
MAPREDUCE-1126 allowed you to set intermediate serialization metadata. This deprecates the old methods like JobConf.setOutputKeyClass(). Note that now the PipesMapRunner/PipesReducer, MapFileOutputFormat, and SequenceFileOutputFormat rely on these deprecated APIs. MAPREDUCE-1360 will require a Hadoop-core-project JIRA that allows SequenceFile to handle non-class-based serialization; that will update at least the SequenceFile IF/OF APIs. Handling Pipes is a separate issue.
This cannot be submitted to the patch queue until a small change is made to the Hadoop-core API (issue is linked), and Hadoop is upgraded across the board to Avro 1.3. I'll mark this patch-available when that happens.