Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I have a map-only MapReduce job which takes Avro input and writes non-Avro output (Hadoop Writables).
The mapper is implemented as a standard Hadoop mapper with <AvroWrapper<IN>,NullWritable,Text,Text> type parameters.
In the job setup, I thought I would be safe to call AvroJob.setInputSchema(MySchema.SCHEMA$), but it seems that this call makes assumptions about what the map output will be. Internally AvroJob.setInputSchema calls configureAvroInput which in turn calls configureAvroShuffle, resulting in my map output key/value types and serialization settings all being set incorrectly for my use case.
This is confusing behaviour - what appears to be a simple setter has too many side effects which are not documented.