I want to create a ToAvro() builtin that converts arbitrary pig fields, including complex types (bags, tuples, maps) to avro format as bytearrays.
This would enable storing Avro records in arbitrary data stores, for example HBaseAvroStorage in PIG-2889
See PIG-2641 for ToJson
This points to a greater need for customizable/pluggable serialization that plugin to storefuncs and do serialization independently. For example, we might do these operations:
a = load 'my_data' as (some_schema);
b = foreach a generate ToJson;
c = foreach a generate ToAvro;
store b into 'hbase://JsonValueTable' using HBaseStorage(...);
store c into 'hbase://AvroValueTable' using HBaseStorage(...);
I'll make a ticket for pluggable serialization separately.