Description
AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as follows:
boolean isMaponly = job.getNumReduceTasks() == 0; if (keySchema != null) { if (isMaponly) AvroJob.setMapOutputKeySchema(job, keySchema); else AvroJob.setOutputKeySchema(job, keySchema); } if (valSchema != null) { if (isMaponly) AvroJob.setMapOutputValueSchema(job, valSchema); else AvroJob.setOutputValueSchema(job, valSchema); }
Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if the job is map only, and uses the OutputKeySchema and OutputValueSchema regardless.
We can fix this by either
- Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job is map only and use the appropriate schema. (Seems right)
- Change AvroMultipleOutputs to always use the OutputKeySchema and OutputValueSchema