I noticed the trunk GenericData and my interface were similar, but there were a few issues which kept me creating a separate interface for this patch:
(1) GenericData only has a single-schema version of createReader(), not the separate writer & reader schemas version needed. This could be fixed by adding another method to the class, or just calling setSchema() on the new instances as needed.
(2) In order to match the existing implementation, under the reflect data model the DatumReader needs to use the class loader of the job configuration. The obvious solution was to make the class Configurable, but that means it needs to be a new Hadoop-related class and not just a class from the base Avro package. This could be solved by having a new e.g. HadoopReflectData (ConfiguredReflectData?) class, with some complications below.
(3) GenericData seems to be an implementation detail, and tying the data model to the GenericData subclass seems to intertwine interface with implementation. For example, my current implementation for Clojure data structures doesn't include a sub-class of GenericData. For another example, each sub-class of GenericData needs to correctly override the creatDatumReader() etc methods to return instances of the correct classes; any existing subclasses which don't override the new methods will silently produce incorrect results at runtime. I think your latter proposal is the way to fix this long term. Short term – if you're okay with the initial implementation using GenericData directly, I'm not going to argue with getting a feature I need sooner, but I'm also not excited about changing code again later to implement a new interface.
If the short-term fixes I mention in the above points seem acceptable, I'll work up another version of the patch.