Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2378

macros don't accept references to items within tuples as arguments

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.9.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I'd like to be able to pass a reference to an item within a parameter to a Pig Macro.

      For example, suppose that I had a relation A with the schema A:

      {id:long, header:(time:long, type:chararray)}

      . I'd like to call a macro by typing:

      B = MY_MACRO(A, header.time);

      but this does not currently work. Obviously, I could define a new relation as a workaround, for example I could use some pig code like

      AA = FOREACH a GENERATE *, header.time as time;
      B = MY_MACRO(AA, time);

      But that's ugly and clunky

      1. PIG-2378.patch.txt
        2 kB
        Johnny Zhang
      2. PIG-2378.patch.txt
        2 kB
        Johnny Zhang

        Activity

        Hide
        dreambird Johnny Zhang added a comment -

        I think it is not a bug, using

        B = MY_MACRO(A, 'header.time');
        

        should just make it work.

        I try it in pig 0.11 release by below test:
        1. macro.pig

        a = load '/var/lib/jenkins/macro' as (f1:int, f2:int, f3:int);
        dump a;
        b = foreach a generate f1, (f2, f3) as f5;
        dump b;
        describe b;
        define macro1 (c, d) returns e {
        $e = group $c by $d;
        }
        e = macro1 (b, 'f5.f2');
        dump e;
        

        2. data macro

        1	2	3
        4	5	6
        7	8	9
        

        3. run the test by 'pig -x local macro.pig' gets result

        (2,{(1,(2,3))})
        (5,{(4,(5,6))})
        (8,{(7,(8,9))})
        

        Joseph Adler, could you please verify whether it works for you? Thanks.

        Show
        dreambird Johnny Zhang added a comment - I think it is not a bug, using B = MY_MACRO(A, 'header.time'); should just make it work. I try it in pig 0.11 release by below test: 1. macro.pig a = load '/var/lib/jenkins/macro' as (f1:int, f2:int, f3:int); dump a; b = foreach a generate f1, (f2, f3) as f5; dump b; describe b; define macro1 (c, d) returns e { $e = group $c by $d; } e = macro1 (b, 'f5.f2'); dump e; 2. data macro 1 2 3 4 5 6 7 8 9 3. run the test by 'pig -x local macro.pig' gets result (2,{(1,(2,3))}) (5,{(4,(5,6))}) (8,{(7,(8,9))}) Joseph Adler , could you please verify whether it works for you? Thanks.
        Hide
        dreambird Johnny Zhang added a comment -

        the work around throw casting exception in trunk right now. Plus, fixing PIG-2244 will close this back door. So looks like we need to make it works. Working on a patch.

        Show
        dreambird Johnny Zhang added a comment - the work around throw casting exception in trunk right now. Plus, fixing PIG-2244 will close this back door. So looks like we need to make it works. Working on a patch.
        Hide
        dreambird Johnny Zhang added a comment -

        this is a working patch resolving the issue. Now you don't have to use the quote hacky working around, you can use relation.filed directly as macro argument. For the test data I described above, it finally generate results

        ((2,3),{(1,(2,3))})
        ((2,4),{(4,(2,4))})
        ((7,8),{(6,(7,8))})
        

        I will run full unit tests see if any regression it brings, since it touches file LogicalSchema.java, which is used by many other places. Meanwhile, improve the code efficiency as much as possible.

        Show
        dreambird Johnny Zhang added a comment - this is a working patch resolving the issue. Now you don't have to use the quote hacky working around, you can use relation.filed directly as macro argument. For the test data I described above, it finally generate results ((2,3),{(1,(2,3))}) ((2,4),{(4,(2,4))}) ((7,8),{(6,(7,8))}) I will run full unit tests see if any regression it brings, since it touches file LogicalSchema.java, which is used by many other places. Meanwhile, improve the code efficiency as much as possible.
        Hide
        dreambird Johnny Zhang added a comment -

        latest patch improve minor issue (comments in code)

        Show
        dreambird Johnny Zhang added a comment - latest patch improve minor issue (comments in code)
        Hide
        dreambird Johnny Zhang added a comment -

        turn out it breaks several unit tests, cancel the current patch

        Show
        dreambird Johnny Zhang added a comment - turn out it breaks several unit tests, cancel the current patch
        Hide
        xuefuz Xuefu Zhang added a comment -

        The fix might not be at the grammar but at functionality extension. It might make sense to support free-text as parameters to be passed to macro.

        Show
        xuefuz Xuefu Zhang added a comment - The fix might not be at the grammar but at functionality extension. It might make sense to support free-text as parameters to be passed to macro.

          People

          • Assignee:
            dreambird Johnny Zhang
            Reporter:
            jadler Joseph Adler
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development