Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4583

Allow UDFs to return multiple values (not inside a Bag or a Tuple)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.14.0
    • None
    • None
    • None

    Description

      When writing a UDF, you are forced to return only one value, or multiple values but inside a Tuple or a Bag. Although this may not seem a big problem, when using Pig in production and performing multiple JOINs, GROUP BYs and any operations that make your schema more and more complicated, it would be nice to create a UDF to reduce the size of your schema, for example:

          rel1  = load 'a' using PigStorage(';', '-schema');
          rel2  = load 'b' using PigStorage(';', '-schema');
      
          joined = join rel1 by id_whatever, rel2 by id_whatever;
      
          ... perform operations
      
          another_rel = load 'c' using PigStorage('.'.'-schema');
          final_rel = join another_rel by id_whatever, joined by id_whatever;
      

      Will have an schema like:

          describe final_rel;
          rel1::joined::id_whatever, rel1:joined::field_1, ......
      

      When you have scripts with hundreds or thousands of lines of code, you end up having more foreachs to rename fields than with actual code. Therefore, I wrote a UDF to handle this so I wouldn't have to write a foreach to rename 100 fields one by one.

      However, due to Pig's limitation of returning only one value, I must place my return values inside a Tupe or a Bag, flatten it, and have another something:: for each of the fields.

      Can we remove this limitation? And if it is done, perhaps upload the UDF I wrote... I think it is a VERY useful function for production environments and large scripts.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Carlos Balduz Carlos Balduz
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: