[PIG-4583] Allow UDFs to return multiple values (not inside a Bag or a Tuple) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.14.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

When writing a UDF, you are forced to return only one value, or multiple values but inside a Tuple or a Bag. Although this may not seem a big problem, when using Pig in production and performing multiple JOINs, GROUP BYs and any operations that make your schema more and more complicated, it would be nice to create a UDF to reduce the size of your schema, for example:

    rel1  = load 'a' using PigStorage(';', '-schema');
    rel2  = load 'b' using PigStorage(';', '-schema');

    joined = join rel1 by id_whatever, rel2 by id_whatever;

    ... perform operations

    another_rel = load 'c' using PigStorage('.'.'-schema');
    final_rel = join another_rel by id_whatever, joined by id_whatever;

Will have an schema like:

    describe final_rel;
    rel1::joined::id_whatever, rel1:joined::field_1, ......

When you have scripts with hundreds or thousands of lines of code, you end up having more foreachs to rename fields than with actual code. Therefore, I wrote a UDF to handle this so I wouldn't have to write a foreach to rename 100 fields one by one.

However, due to Pig's limitation of returning only one value, I must place my return values inside a Tupe or a Bag, flatten it, and have another something:: for each of the fields.

Can we remove this limitation? And if it is done, perhaps upload the UDF I wrote... I think it is a VERY useful function for production environments and large scripts.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Carlos Balduz

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 03/Jun/15 10:34

Updated:: 03/Jun/15 10:39