I've put together a simple cli tool with Python that does the following (with some tunable opts):
CSV to Avro ->
1. Pass a schema file or it generates one based on CSV header with all string types.
2. Read/Split each CSV record (from a list of input files) with given delimiter (default ',') and convert their data to their valid schema types.
p.s. In case of an exception during data-type-mappings (like say null in place of what's supposed to be a float in CSV), check if there's a default field in the schema passed and use it. Else throw an informative exception. I know this makes the 'default' meaning of the schema look wrong, but its a great feature to have!
3. Write these records down into a data file.
Avro to CSV ->
1. Pass a schema to read selective data. Else it reads the file with full schema.
2. Read each record [only works with records for now] and convert all data to string type. Can read from many avro files into a csv file.
3. Write to a csv file with an optional header.
Currently the code (WIP) resides on GitHub at: http://github.com/QwertyManiac/avroutils but I'll submit the stuff as a formal patch once it feels complete.
This comment is for gaining some suggestions. What to extend/etc.