Pig
  1. Pig
  2. PIG-834

incorrect plan when algebraic functions are nested

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: impl
    • Labels:
      None

      Description

      a = load 'students.txt' as (c1,c2,c3,c4);
      c = group a by c2;
      f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

      Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
      Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.

      # Map Reduce Plan                                  
      #--------------------------------------------------
      MapReduce node 1-122
      Map Plan
      Local Rearrange[tuple]{bytearray}(false) - 1-139
      |   |
      |   Project[bytearray][1] - 1-140
      |
      |---New For Each(false,false)[bag] - 1-127
          |   |
          |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
          |   |
          |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
          |       |
          |       |---Project[bag][2] - 1-123
          |           |
          |           |---Project[bag][1] - 1-124
          |   |
          |   Project[bytearray][0] - 1-133
          |
          |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
              |
              |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
      Combine Plan
      Local Rearrange[tuple]{bytearray}(false) - 1-143
      |   |
      |   Project[bytearray][1] - 1-144
      |
      |---New For Each(false,false)[bag] - 1-132
          |   |
          |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
          |   |
          |   |---Project[bag][0] - 1-135
          |   |
          |   Project[bytearray][1] - 1-134
          |
          |---POCombinerPackage[tuple]{bytearray} - 1-137--------
      Reduce Plan
      Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
      |
      |---New For Each(false)[bag] - 1-120
          |   |
          |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
          |   |
          |   |---Project[bag][0] - 1-136
          |
          |---POCombinerPackage[tuple]{bytearray} - 1-145--------
      Global sort: false
      
      1. pig-834.patch
        3 kB
        Ashutosh Chauhan
      2. pig-834_3.patch
        5 kB
        Ashutosh Chauhan
      3. pig-834_2.patch
        5 kB
        Ashutosh Chauhan

        Activity

        Thejas M Nair created issue -
        Thejas M Nair made changes -
        Field Original Value New Value
        Description a = load 'students.txt' as (c1,c2,c3,c4);
        c = group a by c2;
        f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

        Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
        Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.


        # Map Reduce Plan
        #--------------------------------------------------
        MapReduce node 1-122
        Map Plan
        Local Rearrange[tuple]{bytearray}(false) - 1-139
        | |
        | Project[bytearray][1] - 1-140
        |
        |---New For Each(false,false)[bag] - 1-127
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
            | |
            | |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
            | |
            | |---Project[bag][2] - 1-123
            | |
            | |---Project[bag][1] - 1-124
            | |
            | Project[bytearray][0] - 1-133
            |
            |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
                |
                |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
        Combine Plan
        Local Rearrange[tuple]{bytearray}(false) - 1-143
        | |
        | Project[bytearray][1] - 1-144
        |
        |---New For Each(false,false)[bag] - 1-132
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
            | |
            | |---Project[bag][0] - 1-135
            | |
            | Project[bytearray][1] - 1-134
            |
            |---POCombinerPackage[tuple]{bytearray} - 1-137--------
        Reduce Plan
        Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
        |
        |---New For Each(false)[bag] - 1-120
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
            | |
            | |---Project[bag][0] - 1-136
            |
            |---POCombinerPackage[tuple]{bytearray} - 1-145--------
        Global sort: false
        a = load 'students.txt' as (c1,c2,c3,c4);
        c = group a by c2;
        f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

        Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
        Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.

        {code}
        # Map Reduce Plan
        #--------------------------------------------------
        MapReduce node 1-122
        Map Plan
        Local Rearrange[tuple]{bytearray}(false) - 1-139
        | |
        | Project[bytearray][1] - 1-140
        |
        |---New For Each(false,false)[bag] - 1-127
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
            | |
            | |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
            | |
            | |---Project[bag][2] - 1-123
            | |
            | |---Project[bag][1] - 1-124
            | |
            | Project[bytearray][0] - 1-133
            |
            |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
                |
                |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
        Combine Plan
        Local Rearrange[tuple]{bytearray}(false) - 1-143
        | |
        | Project[bytearray][1] - 1-144
        |
        |---New For Each(false,false)[bag] - 1-132
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
            | |
            | |---Project[bag][0] - 1-135
            | |
            | Project[bytearray][1] - 1-134
            |
            |---POCombinerPackage[tuple]{bytearray} - 1-137--------
        Reduce Plan
        Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
        |
        |---New For Each(false)[bag] - 1-120
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
            | |
            | |---Project[bag][0] - 1-136
            |
            |---POCombinerPackage[tuple]{bytearray} - 1-145--------
        Global sort: false
        {code}
        Thejas M Nair made changes -
        Fix Version/s 0.7.0 [ 12314397 ]
        Hide
        Olga Natkovich added a comment -

        The short term solution will be to catch this case and not enable combiner

        Show
        Olga Natkovich added a comment - The short term solution will be to catch this case and not enable combiner
        Olga Natkovich made changes -
        Priority Critical [ 2 ] Major [ 3 ]
        Olga Natkovich made changes -
        Assignee Ashutosh Chauhan [ ashutoshc ]
        Hide
        Ashutosh Chauhan added a comment -

        In this patch, I look for a pattern of POUserFunc followed by another POUserFunc in the inner plan of ForEach and if thats found I flag the combiner optimizer to not fire. This disables the combiner for this particular query (test case included). Wondering if this fix is sufficient for this bug ?

        Show
        Ashutosh Chauhan added a comment - In this patch, I look for a pattern of POUserFunc followed by another POUserFunc in the inner plan of ForEach and if thats found I flag the combiner optimizer to not fire. This disables the combiner for this particular query (test case included). Wondering if this fix is sufficient for this bug ?
        Ashutosh Chauhan made changes -
        Attachment pig-834.patch [ 12434920 ]
        Hide
        Ashutosh Chauhan added a comment -

        Correct approach is following: If leaf of inner plan of ForEach is not combinable then we dont put combiner in any case. If it is, there should not be any other combinable POUserFunc in the ForEach's inner plan. First check already exists in trunk. This patch checks for this second conditon and makes sure not to fire combiner if there is any other combinable POUserFunc in the ForEach inner plan apart from leaf POUserFunc.

        Show
        Ashutosh Chauhan added a comment - Correct approach is following: If leaf of inner plan of ForEach is not combinable then we dont put combiner in any case. If it is, there should not be any other combinable POUserFunc in the ForEach's inner plan. First check already exists in trunk. This patch checks for this second conditon and makes sure not to fire combiner if there is any other combinable POUserFunc in the ForEach inner plan apart from leaf POUserFunc.
        Ashutosh Chauhan made changes -
        Attachment pig-834_2.patch [ 12435027 ]
        Ashutosh Chauhan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Ashutosh Chauhan made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Ashutosh Chauhan added a comment -

        Trying to get hudson going on this.

        Show
        Ashutosh Chauhan added a comment - Trying to get hudson going on this.
        Ashutosh Chauhan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12435027/pig-834_2.patch
        against trunk revision 907760.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/195/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/195/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/195/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435027/pig-834_2.patch against trunk revision 907760. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/195/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/195/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/195/console This message is automatically generated.
        Hide
        Ashutosh Chauhan added a comment -

        Another hudson quirk : ( Failed test passes successfully on local machine. Patch is ready for review.

        Show
        Ashutosh Chauhan added a comment - Another hudson quirk : ( Failed test passes successfully on local machine. Patch is ready for review.
        Ashutosh Chauhan made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Ashutosh Chauhan added a comment -

        Instead of having recursive function walking on plan, better to have a visitor doing that. So, this patch replaces that function with a visitor.

        Show
        Ashutosh Chauhan added a comment - Instead of having recursive function walking on plan, better to have a visitor doing that. So, this patch replaces that function with a visitor.
        Ashutosh Chauhan made changes -
        Attachment pig-834_3.patch [ 12435493 ]
        Ashutosh Chauhan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12435493/pig-834_3.patch
        against trunk revision 908324.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/208/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/208/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/208/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435493/pig-834_3.patch against trunk revision 908324. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/208/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/208/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/208/console This message is automatically generated.
        Hide
        Richard Ding added a comment -

        +1 for commit.

        Show
        Richard Ding added a comment - +1 for commit.
        Hide
        Ashutosh Chauhan added a comment -

        Patch checked-in.

        Show
        Ashutosh Chauhan added a comment - Patch checked-in.
        Ashutosh Chauhan made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Ashutosh Chauhan
            Reporter:
            Thejas M Nair
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development