Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.5.4
-
None
-
None
-
any
Description
The current implementation of Avro MapRed is designed to use JobConf. While it is possible to use job.xml file, it is pretty painful since you have to copy/paste the all schemes for input and output. This is error prone and time consuming. Also any update in a bean requires to recopy/repaste the schema (if using JobConf a simple recompile would be enough).
A proposition to improve this and to stay backward compatible would be to introduce new keys in AvroJob and reference the actual avro bean used. This can be implemented as a fallback.
New keys would be created:
- avro.input.schema > avro.input.class
- avro.map.output.schema > avro.map.output.class
- avro.output.schema > avro.output.class
Only 3 methods would be impacted in AvroJob:
- getInputSchema(Configuration job)
{
// Implement a fallback like
String s = job.get(INPUT_SCHEMA);
if(s==null) s = (String)Class.forName(job.get(INPUT_CLASS)).getDeclaredField("SCHEMA$").get(null);
return Schema.parse(s);
}
}
- getMapOutputSchema()
- getOutputSchema()
Also, it would be more consistent to add new setters. This is not mandatory since in that use case, the new keys are filled up directly in the job, not using AvroJob.