Description
I've been working on a pretty major patch that changes the way the C library implements schema resolution. Before, we would compare the writer and reader schemas each time we try to read a record from an Avro file. This is a fair bit of wasted effort. The approach I'm taking with the new implementation is to separate schema resolution and binary parsing into separate operations. There's a new "consumer" API, which defines a set of callbacks for processing Avro data that conforms to a schema. The new avro_consume_binary function reads binary-encoded Avro data from a buffer or file, and passes that data into a consumer instance. Each consumer instance is associated with the writer schema of the data that it expects to process.
Schema resolution is now implemented in the new avro_resolver_new function, which returns a consumer instance that knows how to translate from the writer schema to the reader schema. As the resolver receives data via the consumer API, it fills in the contents of a destination avro_datum_t (which should be an instance of the reader schema).
This work isn't complete yet — I still have to implement promotion (int->long and friends), and have to add support for recursive schemas (via the AVRO_LINK schema type). But I wanted to get the patch out there for people to view and test in the meantime. This patch depends on a few other of my patches, that haven't made it into SVN yet; if you want to test the code without patching yourself, I have a tracking branch on github.