Pig
  1. Pig
  2. PIG-1157

Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.7.0
    • Component/s: impl
    • Labels:
      None

      Description

      Hi all,
      I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS.

      A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c);
      A1 = FOREACH A GENERATE a;
      B = GROUP A1 BY a;
      C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y);
      D = JOIN C BY x, B BY group USING "replicated";
      E = JOIN A BY a, D by x USING "replicated";
      dump E;
      

      2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4
      2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees.
      2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees.
      2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees.
      2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2
      2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread
      Details at logfile: pig_1260990666148.log

      Looking at the log file:

      Pig Stack Trace
      ---------------
      ERROR 2998: Unhandled internal error. unable to create new native thread

      java.lang.OutOfMemoryError: unable to create new native thread
      at java.lang.Thread.start0(Native Method)
      at java.lang.Thread.start(Thread.java:597)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131)
      at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
      at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773)
      at org.apache.pig.PigServer.store(PigServer.java:522)
      at org.apache.pig.PigServer.openIterator(PigServer.java:458)
      at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
      at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
      at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
      at org.apache.pig.Main.main(Main.java:397)
      ================================================================================

      If we want to look at the explain output, we find that there is no Map Reduce plan that is generated.

      Why is the M/R plan not generated?

      Attaching the script and explain output.
      Viraj

      1. PIG-1157.patch
        5 kB
        Richard Ding
      2. PIG-1157.patch
        4 kB
        Richard Ding
      3. oomreplicatedjoin.pig
        0.3 kB
        Viraj Bhat
      4. replicatedjoinexplain.log
        7 kB
        Viraj Bhat

        Activity

        Hide
        Viraj Bhat added a comment -

        Explain output and Pig script.

        Show
        Viraj Bhat added a comment - Explain output and Pig script.
        Hide
        Richard Ding added a comment -

        Two quick observations:

        1. The script works if multi-query optimization is disabled (-M).
        2. The script also works if using regular join instead of replicated join.

        I'll look into it further.

        Show
        Richard Ding added a comment - Two quick observations: 1. The script works if multi-query optimization is disabled (-M). 2. The script also works if using regular join instead of replicated join. I'll look into it further.
        Hide
        Viraj Bhat added a comment -

        Hi Richard,

        Thanks for your suggestion, it works. Additionally we could also use the "exec" statement before the alias E to prevent the implicit dependency.

        How hard/easy is it for Pig to find out if there is an implicit dependency or not. Pig anyway has a copy of the logical plan in memory, where it knows that alias E requires output from D which is generated in the previous step.

        Can we not warn the user about this implicit dependency?

        Viraj

        Show
        Viraj Bhat added a comment - Hi Richard, Thanks for your suggestion, it works. Additionally we could also use the "exec" statement before the alias E to prevent the implicit dependency. How hard/easy is it for Pig to find out if there is an implicit dependency or not. Pig anyway has a copy of the logical plan in memory, where it knows that alias E requires output from D which is generated in the previous step. Can we not warn the user about this implicit dependency? Viraj
        Hide
        Richard Ding added a comment -

        The problem is that, by merging a MR splittee with a FR join, the MultiQuery optimizer may introduce a direct cycle to the graph of the MR plan. This patch fixed this problem by not merging FR splitees.

        This is actually stronger than necessary. A better solution would be to check if merging a MR splittee would form a directed cycle in the original DAG before merging it, and if not, allow the merge to go ahead.

        Show
        Richard Ding added a comment - The problem is that, by merging a MR splittee with a FR join, the MultiQuery optimizer may introduce a direct cycle to the graph of the MR plan. This patch fixed this problem by not merging FR splitees. This is actually stronger than necessary. A better solution would be to check if merging a MR splittee would form a directed cycle in the original DAG before merging it, and if not, allow the merge to go ahead.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428359/PIG-1157.patch
        against trunk revision 892125.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428359/PIG-1157.patch against trunk revision 892125. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/console This message is automatically generated.
        Hide
        Olga Natkovich added a comment -

        +1. Patch looks good. Will commit once the tests pass.

        Show
        Olga Natkovich added a comment - +1. Patch looks good. Will commit once the tests pass.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428448/PIG-1157.patch
        against trunk revision 892125.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428448/PIG-1157.patch against trunk revision 892125. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/console This message is automatically generated.
        Hide
        Olga Natkovich added a comment -

        patch committed, thanks Richard

        Show
        Olga Natkovich added a comment - patch committed, thanks Richard

          People

          • Assignee:
            Richard Ding
            Reporter:
            Viraj Bhat
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development