Pig
  1. Pig
  2. PIG-2530

Reusing alias name in nested foreach causes incorrect results

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.9.2, 0.10.0
    • Fix Version/s: 0.10.0, 0.9.3, 0.11
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The below script results in incorrect output for Pig 0.10 but runs fine with Pig 0.8,

      input.txt
      1	4
      1	3
      2	3
      2	4
      
      a = load 'input.txt' as (v1:int, v2:int);
      b = group a by v1;
      c = foreach b { x = a; x = order x by v2 asc; generate flatten(x); }
      store c into 'c1';
      

      Output from Pig 0.10
      --------------------
      1 4
      1 3
      2 3
      2 4

      Looking at the explain, it seems like the sorting is entirely missed out.
      The script produces correct results if I change the alias name ie;

      c = foreach b

      { x = a; x1 = order x by v2 asc; generate flatten(x1); }
      1. PIG-2530-2.patch
        2 kB
        Daniel Dai
      2. PIG-2530-1.patch
        6 kB
        Thomas Weise
      3. PIG-2530-0.patch
        4 kB
        Daniel Dai

        Activity

        Hide
        Xuefu Zhang added a comment -

        I just realized that this patch were to fix duplicated command operators, not expression alias.

        Show
        Xuefu Zhang added a comment - I just realized that this patch were to fix duplicated command operators, not expression alias.
        Hide
        Xuefu Zhang added a comment -

        Daniel Dai Looking at the patch, I don't think we can simply removed previous its exprPlan when an alias is redefined. For instance, the query in PIG-3379 doesn't work with this patch.

        The fix is probably not at the grammar, but rather at the data structure we use to represent these aliases and their expr plans.

        Any thoughts?

        Show
        Xuefu Zhang added a comment - Daniel Dai Looking at the patch, I don't think we can simply removed previous its exprPlan when an alias is redefined. For instance, the query in PIG-3379 doesn't work with this patch. The fix is probably not at the grammar, but rather at the data structure we use to represent these aliases and their expr plans. Any thoughts?
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Thomas Weise made changes -
        Fix Version/s 0.9.3 [ 12319456 ]
        Hide
        Daniel Dai added a comment -

        Committed to 0.9 branch as well.

        Show
        Daniel Dai added a comment - Committed to 0.9 branch as well.
        Hide
        Thomas Weise added a comment -

        Unit tests pass on 0.9 branch, would be good to commit there since it is producing false positives.

        Show
        Thomas Weise added a comment - Unit tests pass on 0.9 branch, would be good to commit there since it is producing false positives.
        Hide
        Vivek Padmanabhan added a comment -

        Thanks a lot Daniel and Thomas. Does the patch apply for 0.9.2 as well ? or is 0.9.2 having a different behavior.

        Show
        Vivek Padmanabhan added a comment - Thanks a lot Daniel and Thomas. Does the patch apply for 0.9.2 as well ? or is 0.9.2 having a different behavior.
        Daniel Dai made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Assignee Daniel Dai [ daijy ]
        Fix Version/s 0.10 [ 12316246 ]
        Fix Version/s 0.11 [ 12318878 ]
        Resolution Fixed [ 1 ]
        Hide
        Daniel Dai added a comment -

        Patch committed to 0.10/trunk.

        Show
        Daniel Dai added a comment - Patch committed to 0.10/trunk.
        Hide
        Thejas M Nair added a comment -

        +1 .

        Note that this bug happens when an nested foreach operator is assigned to another and then same is reused for another operator. In the example in description -

        c = foreach b { 
                x = a; -- a is assinged to x
                x = order x by v2 asc;      -- x is reused 
                generate flatten(x);        -- because of the bug, x was substituted with a
            }
        
        
        Show
        Thejas M Nair added a comment - +1 . Note that this bug happens when an nested foreach operator is assigned to another and then same is reused for another operator. In the example in description - c = foreach b { x = a; -- a is assinged to x x = order x by v2 asc; -- x is reused generate flatten(x); -- because of the bug, x was substituted with a }
        Hide
        Thomas Weise added a comment -

        Unit tests pass on 0.10

        Show
        Thomas Weise added a comment - Unit tests pass on 0.10
        Hide
        Daniel Dai added a comment -

        Unit test pass. test-patch:
        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 3 new or modified tests.
        [exec]
        [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
        [exec]
        [exec] +1 javac.
        [exec] The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] -1 release audit. The applied patch generated 529 release audit warnings (more than the trunk's current 526 warnings).

        javac and release audit warning are unrelated.

        Patch is ready to review.

        Show
        Daniel Dai added a comment - Unit test pass. test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. [exec] The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 529 release audit warnings (more than the trunk's current 526 warnings). javac and release audit warning are unrelated. Patch is ready to review.
        Daniel Dai made changes -
        Attachment PIG-2530-2.patch [ 12514766 ]
        Hide
        Daniel Dai added a comment -

        Attach a better fix PIG-2530-2.patch. Thanks Thomas for adding tests.

        Show
        Daniel Dai added a comment - Attach a better fix PIG-2530 -2.patch. Thanks Thomas for adding tests.
        Thomas Weise made changes -
        Attachment PIG-2530-1.patch [ 12514739 ]
        Hide
        Thomas Weise added a comment -

        Adding test case.

        Show
        Thomas Weise added a comment - Adding test case.
        Daniel Dai made changes -
        Field Original Value New Value
        Attachment PIG-2530-0.patch [ 12514474 ]
        Hide
        Daniel Dai added a comment -

        Attach a draft patch

        Show
        Daniel Dai added a comment - Attach a draft patch
        Vivek Padmanabhan created issue -

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Vivek Padmanabhan
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development