please package this as a single patch file that replaces the existing implementation.
Sure, will address this on Sunday, when I return to the states.
some of the TODO's seem critical, like skip_int.
skip_int and skip_long are copied from the old Python implementation. I believe they are broken, but this patch doesn't introduce the problem. I plan to add tests and sort out that issue soon, but can I address the TODOs in separate JIRAs? Blocking the commit of this patch for TODO scrubbing will mean more work outside of Apache's SVN.
those big if .. elif expressions in read_data, write_data and skip_data look like performance pits.
The comments on that blog post point out that a bit if/(elif)+/else block is the standard way to approximate switch/case in Python. Simon's idiom is less popular in Python code I've seen. The previous implementation built a dict of function calls, similar to the blog post you point out, and I found that to be unnecessarily complex. My goal with the Python code is to be correct, concise, and easy to understand first, and fast second. Can we keep the current approach and benchmark it in AVRO-217?
validate is overkill for picking the union branch.
Your suggestion sounds like a performance optimization to avoid calling validate() many times, but which would further obfuscate the function of the code. I don't think it's a good idea at this time, given the above stated aims of the Python implementation. If I've misunderstood your intent, please correct me.