Pig
  1. Pig
  2. PIG-1308

Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Simple script fails to read files from BinStorage() and fails to submit jobs to JobTracker. This occurs with trunk and not with Pig 0.6 branch.

      data = load 'binstoragesample' using BinStorage() as (s, m, l);
      A = foreach ULT generate   s#'key'         as value;
      X = limit A 20;
      dump X;
      

      When this script is submitted to the Jobtracker, we found the following error:
      2010-03-18 22:31:22,296 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:32:01,574 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:32:43,276 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:33:21,743 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:34:02,004 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:34:43,442 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:35:25,907 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:36:07,402 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:36:48,596 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:37:28,014 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:38:04,823 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:38:38,981 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
      2010-03-18 22:39:12,220 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2

      Stack Trace revelead

      at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:144)
      at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
      at org.apache.pig.builtin.BinStorage.getSchema(BinStorage.java:404)
      at org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:167)
      at org.apache.pig.impl.logicalLayer.LOLoad.getProjectionMap(LOLoad.java:263)
      at org.apache.pig.impl.logicalLayer.ProjectionMapCalculator.visit(ProjectionMapCalculator.java:112)
      at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210)
      at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52)
      at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
      at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
      at org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildProjectionMaps(LogicalTransformer.java:76)
      at org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:216)
      at org.apache.pig.PigServer.compileLp(PigServer.java:883)
      at org.apache.pig.PigServer.store(PigServer.java:564)

      The binstorage data was generated from 2 datasets using limit and union:

      Large1 = load 'input1'  using PigStorage();
      Large2 = load 'input2' using PigStorage();
      V = limit Large1 10000;
      C = limit Large2 10000;
      U = union V, C;
      store U into 'binstoragesample' using BinStorage();
      
      1. PIG-1308.patch
        6 kB
        Pradeep Kamath

        Activity

        Hide
        Pradeep Kamath added a comment -

        The root cause of the issue is that the OpLimitOptimizer has a relaxed check() implementation which only checks if the node matched by RuleMatcher is a LOLimit which would be true any time there is a LOLimit in the plan. This results in the optimizer running 500 (the current max) iterations of all rules since the OpLimitOptimizer always matches.

        The attached patch fixes the issue by tightening the implementation of OpLimitOptimizer.check() to return false in cases where LOLimit cannot be pushed up.

        Show
        Pradeep Kamath added a comment - The root cause of the issue is that the OpLimitOptimizer has a relaxed check() implementation which only checks if the node matched by RuleMatcher is a LOLimit which would be true any time there is a LOLimit in the plan. This results in the optimizer running 500 (the current max) iterations of all rules since the OpLimitOptimizer always matches. The attached patch fixes the issue by tightening the implementation of OpLimitOptimizer.check() to return false in cases where LOLimit cannot be pushed up.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12439354/PIG-1308.patch
        against trunk revision 925513.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439354/PIG-1308.patch against trunk revision 925513. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/console This message is automatically generated.
        Hide
        Daniel Dai added a comment -

        +1

        Show
        Daniel Dai added a comment - +1
        Hide
        Pradeep Kamath added a comment -

        Patch committed to trunk

        Show
        Pradeep Kamath added a comment - Patch committed to trunk

          People

          • Assignee:
            Pradeep Kamath
            Reporter:
            Viraj Bhat
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development