Pig
  1. Pig
  2. PIG-1060

MultiQuery optimization throws error for multi-level splits

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.6.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Consider the following scenario :-
      1. Multi-level splits in the map plan.
      2. Each split branch further progressing across a local-global rearrange.
      3. Output of each of these finally merged via a UNION.

      MultiQuery optimizer throws the following error in such a case:
      "ERROR 2146: Internal Error. Inconsistency in key index found during optimization."

      1. PIG-1060.patch
        40 kB
        Richard Ding

        Activity

        Hide
        Daniel Dai added a comment -

        Patch committed. Thanks Richard!

        Show
        Daniel Dai added a comment - Patch committed. Thanks Richard!
        Hide
        Daniel Dai added a comment -

        Patch looks good. Will commit to both trunk and 0.6 branch as it is.

        Show
        Daniel Dai added a comment - Patch looks good. Will commit to both trunk and 0.6 branch as it is.
        Hide
        Richard Ding added a comment -

        The release audit warnings are all from html files.

        Show
        Richard Ding added a comment - The release audit warnings are all from html files.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12424143/PIG-1060.patch
        against trunk revision 833126.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 319 release audit warnings (more than the trunk's current 318 warnings).

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424143/PIG-1060.patch against trunk revision 833126. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 319 release audit warnings (more than the trunk's current 318 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12424143/PIG-1060.patch
        against trunk revision 833102.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 319 release audit warnings (more than the trunk's current 318 warnings).

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/40/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/40/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/40/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/40/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424143/PIG-1060.patch against trunk revision 833102. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 319 release audit warnings (more than the trunk's current 318 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/40/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/40/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/40/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/40/console This message is automatically generated.
        Hide
        Richard Ding added a comment -

        This patch fixes the bug.

        Show
        Richard Ding added a comment - This patch fixes the bug.
        Hide
        Viraj Bhat added a comment -

        Hi Ankur and Richard,
        I have a script which demonstrates a similar problem, but can be solved by using the -M option. This script can reproduce the problem even without the UNION operator , but it has properties 1 and 2 of the original problem description.

        Try commenting out the F alias. It works fine.

        
        ORGINALDATA = load '/user/viraj/somedata.txt' using PigStorage() as (col1, col2, col3, col4, col5, col6, col7, col8);
        
        
        
        --Check data
        
        A = foreach ORGINALDATA generate col1, col2, col3, col4, col5, col6;
        
        B = group A all;
        
        C = foreach B generate COUNT(A);
        
        store C into '/user/viraj/result1';
        
        
        
        D = filter A by (col1 == col2) or (col1 == col3);
        
        E = group D all;
        
        F = foreach E generate COUNT(D);
        
        --try commenting F
        store F into '/user/viraj/result2';
        
        
        
        G = filter D by (col4 == col5) ;
        
        H = group G all;
        
        I = foreach H generate COUNT(G);
        
        store I into '/user/viraj/result3';
        
        
        
        J = filter G by (((col6 == 'm') or (col6 == 'M')) and (col6 == 1)) or (((col6 == 'f') or (col6 == 'F')) and (col6 == 0)) or ((col6 == '') and (col6 == -1));
        
        K = group J all;
        
        L = foreach K generate COUNT(J);
        
        store L into '/user/viraj/result4';
        
        
        Show
        Viraj Bhat added a comment - Hi Ankur and Richard, I have a script which demonstrates a similar problem, but can be solved by using the -M option. This script can reproduce the problem even without the UNION operator , but it has properties 1 and 2 of the original problem description. Try commenting out the F alias. It works fine. ORGINALDATA = load '/user/viraj/somedata.txt' using PigStorage() as (col1, col2, col3, col4, col5, col6, col7, col8); --Check data A = foreach ORGINALDATA generate col1, col2, col3, col4, col5, col6; B = group A all; C = foreach B generate COUNT(A); store C into '/user/viraj/result1'; D = filter A by (col1 == col2) or (col1 == col3); E = group D all; F = foreach E generate COUNT(D); -- try commenting F store F into '/user/viraj/result2'; G = filter D by (col4 == col5) ; H = group G all; I = foreach H generate COUNT(G); store I into '/user/viraj/result3'; J = filter G by (((col6 == 'm') or (col6 == 'M')) and (col6 == 1)) or (((col6 == 'f') or (col6 == 'F')) and (col6 == 0)) or ((col6 == '') and (col6 == -1)); K = group J all; L = foreach K generate COUNT(J); store L into '/user/viraj/result4';
        Hide
        Richard Ding added a comment -

        Never mind, there is a typo in my script. Here is the stack trace:

        Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2146: Internal Error. Inconsistency in key index found during optimization.
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.addShiftedKeyInfoIndex(MultiQueryOptimizer.java:679)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.addShiftedKeyInfoIndex(MultiQueryOptimizer.java:686)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeOneReducePlanWithIndex(MultiQueryOptimizer.java:584)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeAllMapReduceSplittees(MultiQueryOptimizer.java:903)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeMapReduceSplittees(MultiQueryOptimizer.java:371)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visitMROp(MultiQueryOptimizer.java:175)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:209)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:44)
        at org.apache.pig.impl.plan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:69)
        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visit(MultiQueryOptimizer.java:90)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:393)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:103)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)

        Show
        Richard Ding added a comment - Never mind, there is a typo in my script. Here is the stack trace: Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2146: Internal Error. Inconsistency in key index found during optimization. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.addShiftedKeyInfoIndex(MultiQueryOptimizer.java:679) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.addShiftedKeyInfoIndex(MultiQueryOptimizer.java:686) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeOneReducePlanWithIndex(MultiQueryOptimizer.java:584) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeAllMapReduceSplittees(MultiQueryOptimizer.java:903) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeMapReduceSplittees(MultiQueryOptimizer.java:371) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visitMROp(MultiQueryOptimizer.java:175) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:209) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:44) at org.apache.pig.impl.plan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:69) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visit(MultiQueryOptimizer.java:90) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:393) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:103) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
        Hide
        Richard Ding added a comment -

        Hi Ankur,

        I can't reproduce the bug with the latest code in trunk. Can you please attach the log output in the bug?

        Thanks,
        – Richard

        Show
        Richard Ding added a comment - Hi Ankur, I can't reproduce the bug with the latest code in trunk. Can you please attach the log output in the bug? Thanks, – Richard
        Hide
        Ankur added a comment -

        Here's a sample script to illustrate the issue. Note that sample data isn't very important here since the optimization and execution fail.
        === test.pig ====

        data = LOAD 'dummy' as (name:chararray, freq:int);

        filter1 = FILTER data BY freq < 5;
        group1 = GROUP filter1 BY name;
        proj1 = FOREACH group1 GENERATE FLATTEN(group), 'string1', SUM(filter1.freq);

        filter2 = FILTER data by freq > 5;
        group2 = GROUP filter2 BY name;
        proj2 = FOREACH group2 GENERATE FLATTEN(group), 'string2', SUM(filter2.freq);

        filter3 = FILTER filter2 by freq < 10;
        group3 = GROUP filter3 By name;
        proj3 = FOREACH group3 GENERATE FLATTEN(group), 'string3', SUM(filter3.freq);

        filter4 = FILTER filter3 by freq > 7;
        group4 = GROUP filter4 By name;
        proj4 = FOREACH group4 GENERATE FLATTEN(group), 'string4', SUM(filter4.freq);

        M1 = LIMIT proj1 10;
        M2 = LIMIT proj2 10;
        M3 = LIMIT proj3 10;
        M4 = LIMIT proj4 10;

        U = UNION M1, M2, M3, M4;

        STORE U INTO 'res' USING PigStorage();

        The dot output can dumped via command - "explain -dot -script test.pig;" to visualize the scenario.
        A surprising observation is that despite turning MultiQuery off using -M, it seems that the MultiQuery optimizer is still runs and fails the script.

        Show
        Ankur added a comment - Here's a sample script to illustrate the issue. Note that sample data isn't very important here since the optimization and execution fail. === test.pig ==== data = LOAD 'dummy' as (name:chararray, freq:int); filter1 = FILTER data BY freq < 5; group1 = GROUP filter1 BY name; proj1 = FOREACH group1 GENERATE FLATTEN(group), 'string1', SUM(filter1.freq); filter2 = FILTER data by freq > 5; group2 = GROUP filter2 BY name; proj2 = FOREACH group2 GENERATE FLATTEN(group), 'string2', SUM(filter2.freq); filter3 = FILTER filter2 by freq < 10; group3 = GROUP filter3 By name; proj3 = FOREACH group3 GENERATE FLATTEN(group), 'string3', SUM(filter3.freq); filter4 = FILTER filter3 by freq > 7; group4 = GROUP filter4 By name; proj4 = FOREACH group4 GENERATE FLATTEN(group), 'string4', SUM(filter4.freq); M1 = LIMIT proj1 10; M2 = LIMIT proj2 10; M3 = LIMIT proj3 10; M4 = LIMIT proj4 10; U = UNION M1, M2, M3, M4; STORE U INTO 'res' USING PigStorage(); The dot output can dumped via command - "explain -dot -script test.pig;" to visualize the scenario. A surprising observation is that despite turning MultiQuery off using -M, it seems that the MultiQuery optimizer is still runs and fails the script.

          People

          • Assignee:
            Richard Ding
            Reporter:
            Ankur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development