Pig
  1. Pig
  2. PIG-2445

AvroStorage can't store two relations in one script

    Details

      Description

      STORE one INTO '/tmp/one.avro' USING AvroStorage();
      STORE two INTO '/tmp/two.avro' USING AvroStorage();

      – relation two has the schema of relation one. BANG!

        Activity

        Cheolsoo Park made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Not A Problem [ 8 ]
        Hide
        Cheolsoo Park added a comment -

        AvroStorage can store two relations in one script. In fact, there was the same question to user group a while ago. I am copying my answer here:

        The AvroStorage has very funny syntax regarding multiple stores. To apply different avro schemas to multiple stores, you have to specify their "index" as follows:

        set1 = load 'input1.txt' using PigStorage('|') as ( ... );
        store set1 into 'set1' using org.apache.pig.piggybank.storage.avro.AvroStorage('index', '1');

        set2 = load 'input2.txt' using PigStorage('|') as ( .. );
        store set2 into 'set2' using org.apache.pig.piggybank.storage.avro.AvroStorage('index', '2');

        As can be seen, I added the 'index' parameters.

        What AvroStorage does is to construct the following string in the frontend:

        "1#<1st avro schema>,2#<2nd avro schema>"

        and pass it to backend via UdfContext. Now in backend, tasks parse this string to get output schema for each store.

        This is also documented at the AvroStorage wiki. (Please see "index".) Obviously, this is not very intuitive, so I was thinking of writing a new AvroStorage with more intuitive options although I haven't started yet.

        I think that we should close this jira. Please let me know if anyone has objections.

        Thanks!

        Show
        Cheolsoo Park added a comment - AvroStorage can store two relations in one script. In fact, there was the same question to user group a while ago. I am copying my answer here: The AvroStorage has very funny syntax regarding multiple stores. To apply different avro schemas to multiple stores, you have to specify their "index" as follows: set1 = load 'input1.txt' using PigStorage('|') as ( ... ); store set1 into 'set1' using org.apache.pig.piggybank.storage.avro.AvroStorage('index', '1'); set2 = load 'input2.txt' using PigStorage('|') as ( .. ); store set2 into 'set2' using org.apache.pig.piggybank.storage.avro.AvroStorage('index', '2'); As can be seen, I added the 'index' parameters. What AvroStorage does is to construct the following string in the frontend: "1#<1st avro schema>,2#<2nd avro schema>" and pass it to backend via UdfContext. Now in backend, tasks parse this string to get output schema for each store. This is also documented at the AvroStorage wiki . (Please see "index".) Obviously, this is not very intuitive, so I was thinking of writing a new AvroStorage with more intuitive options although I haven't started yet. I think that we should close this jira. Please let me know if anyone has objections. Thanks!
        Hide
        Dmitriy V. Ryaboy added a comment -

        Is this still an issue, Russ?

        Does anyone at LI want to contribute a patch to this? Or they could post their implementation to DataFu...

        Show
        Dmitriy V. Ryaboy added a comment - Is this still an issue, Russ? Does anyone at LI want to contribute a patch to this? Or they could post their implementation to DataFu...
        Russell Jurney made changes -
        Field Original Value New Value
        Summary AvroStorage cam AvroStorage can't store two relations in one script
        Labels avro fun happy pants pig pig_udf storefunc
        Affects Version/s 0.9.1 [ 12317343 ]
        Affects Version/s 0.10 [ 12316246 ]
        Affects Version/s 0.9.2 [ 12318248 ]
        Description STORE one INTO '/tmp/one.avro' USING AvroStorage();
        STORE two INTO '/tmp/two.avro' USING AvroStorage();

        -- relation two has the schema of relation one. BANG!
        Component/s piggybank [ 12315410 ]
        Russell Jurney created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Russell Jurney
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development