Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3111

ToAvro to convert any Pig record to an Avro bytearray

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.12.0
    • 0.18.0
    • data, internal-udfs
    • None

    Description

      I want to create a ToAvro() builtin that converts arbitrary pig fields, including complex types (bags, tuples, maps) to avro format as bytearrays.

      This would enable storing Avro records in arbitrary data stores, for example HBaseAvroStorage in PIG-2889

      See PIG-2641 for ToJson

      This points to a greater need for customizable/pluggable serialization that plugin to storefuncs and do serialization independently. For example, we might do these operations:

      a = load 'my_data' as (some_schema);
      b = foreach a generate ToJson;
      c = foreach a generate ToAvro;
      store b into 'hbase://JsonValueTable' using HBaseStorage(...);
      store c into 'hbase://AvroValueTable' using HBaseStorage(...);

      I'll make a ticket for pluggable serialization separately.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            russell.jurney Russell Jurney
            russell.jurney Russell Jurney

            Dates

              Created:
              Updated:

              Slack

                Issue deployment