Here is a functional patch for the mapred (Old) APIs with a reflect based test case that illustrates a sample join operation.
I've not yet delved into the mapreduce (New) APIs, but it would be implemented in nearly the same way.
Any comments on the approach before I begin work on the mapreduce equivalent?
Here are some implementation points:
- Only works for Specific and Reflect based MR that use mapred.AvroInputFormat and mapred.AvroMapper/mapred.AvroReducer classes.
- Only schema and map classes can be configured per path.
- No input format class flexibility like its Apache Hadoop equivalent.
- Passing a schema when adding an input path is mandatory.
- Passing a mapper class when adding an input path is also mandatory.