Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3111

ToAvro to convert any Pig record to an Avro bytearray

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.18.0
    • Component/s: data, internal-udfs
    • Labels:
      None

      Description

      I want to create a ToAvro() builtin that converts arbitrary pig fields, including complex types (bags, tuples, maps) to avro format as bytearrays.

      This would enable storing Avro records in arbitrary data stores, for example HBaseAvroStorage in PIG-2889

      See PIG-2641 for ToJson

      This points to a greater need for customizable/pluggable serialization that plugin to storefuncs and do serialization independently. For example, we might do these operations:

      a = load 'my_data' as (some_schema);
      b = foreach a generate ToJson;
      c = foreach a generate ToAvro;
      store b into 'hbase://JsonValueTable' using HBaseStorage(...);
      store c into 'hbase://AvroValueTable' using HBaseStorage(...);

      I'll make a ticket for pluggable serialization separately.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                russell.jurney Russell Jurney
                Reporter:
                russell.jurney Russell Jurney
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: