Pig
  1. Pig
  2. PIG-1798

nested foreach - alias for expression does not work

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In following example, the nested foreach statement has an alias ld used for output of distinct udf . Pig gives an error during query plan generation -

      grunt> l = load 'x' as (a, b);
      grunt> g = group l by a;
      grunt> f = foreach g { ld = org.apache.pig.builtin.Distinct(l); f = filter ld by $0 > 1; generate COUNT(f);}
      grunt> explain f;
      2011-01-11 12:18:33,908 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_INT 1 time(s).
      2011-01-11 12:18:33,908 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
      2011-01-11 12:18:33,941 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1067: Unable to explain alias f
      Details at logfile: /Users/tejas/pig_comb2/trunk/pig_1294777094048.log
      
      
      
      less /Users/tejas/pig_comb2/trunk/pig_1294777094048.log
      Pig Stack Trace
      ---------------
      ERROR 1067: Unable to explain alias f
      
      org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias f
              at org.apache.pig.PigServer.explain(PigServer.java:1053)
              at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:358)
              at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:290)
              at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:253)
              at org.apache.pig.tools.pigscript.parser.PigScriptParser.Explain(PigScriptParser.java:665)
              at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:325)
              at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
              at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
              at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
              at org.apache.pig.Main.run(Main.java:475)
              at org.apache.pig.Main.main(Main.java:109)
      Caused by: java.lang.NullPointerException
              at org.apache.pig.newplan.logical.ForeachInnerPlanVisitor.translateInnerPlanConnection(ForeachInnerPlanVisitor.java:87)
              at org.apache.pig.newplan.logical.ForeachInnerPlanVisitor.visit(ForeachInnerPlanVisitor.java:245)
              at org.apache.pig.impl.logicalLayer.LOFilter.visit(LOFilter.java:114)
              at org.apache.pig.impl.logicalLayer.LogicalOperator.visit(LogicalOperator.java:1)
              at org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:71)
              at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
              at org.apache.pig.newplan.logical.LogicalPlanMigrationVistor.visit(LogicalPlanMigrationVistor.java:245)
              at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:132)
              at org.apache.pig.impl.logicalLayer.LogicalOperator.visit(LogicalOperator.java:1)
              at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
              at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
              at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:264)
              at org.apache.pig.PigServer.compilePp(PigServer.java:1460)
              at org.apache.pig.PigServer.explain(PigServer.java:1022)
              ... 10 more
      ================================================================================
      
      

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          Doesn't feel there is an easy solution (don't even think this is valid). In nested plan, we assume input bags will be streamed into nested operator (except LOGenerate). If there is a UDF, it should take tuple. However, Distinct takes the bag, which break the rule. Eg, the following script should be doable:

          f = foreach g

          { ld = ABS(l.$0); f = filter ld by $0 > 1; generate COUNT(f);}

          ABS see l as tuple stream instead of bag. This syntax is not currently supported, but totally doable. However, if you want to use a UDF which see l as a bag, you can only do it in generate statement, which takes every input as bag:

          f = foreach g

          { f = filter l by $0 > 1; generate COUNT(Distinct(f));}

          Unlink this issue from 0.10.

          Show
          Daniel Dai added a comment - Doesn't feel there is an easy solution (don't even think this is valid). In nested plan, we assume input bags will be streamed into nested operator (except LOGenerate). If there is a UDF, it should take tuple. However, Distinct takes the bag, which break the rule. Eg, the following script should be doable: f = foreach g { ld = ABS(l.$0); f = filter ld by $0 > 1; generate COUNT(f);} ABS see l as tuple stream instead of bag. This syntax is not currently supported, but totally doable. However, if you want to use a UDF which see l as a bag, you can only do it in generate statement, which takes every input as bag: f = foreach g { f = filter l by $0 > 1; generate COUNT(Distinct(f));} Unlink this issue from 0.10.
          Hide
          Olga Natkovich added a comment -

          In the latest code, the script does not even parse. I get the following error:

          2011-10-04 15:42:16,197 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
          <line 5, column 68> expression is not a project expression: (Name: UserFunc(org.apache.pig.builtin.Distinct) Type: null Uid: null)

          Show
          Olga Natkovich added a comment - In the latest code, the script does not even parse. I get the following error: 2011-10-04 15:42:16,197 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 5, column 68> expression is not a project expression: (Name: UserFunc(org.apache.pig.builtin.Distinct) Type: null Uid: null)
          Hide
          Thejas M Nair added a comment -

          In pig 0.7 it gives following error during query execution -

          java.lang.ClassCastException: org.apache.pig.data.InternalDistinctBag cannot be cast to org.apache.pig.data.Tuple
          at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:103)
          at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
          at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:477)
          at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
          at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:143)
          at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:195)
          at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:290)
          at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:360)
          at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:436)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:404)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:384)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:252)
          at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
          at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
          at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
          at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

          Show
          Thejas M Nair added a comment - In pig 0.7 it gives following error during query execution - java.lang.ClassCastException: org.apache.pig.data.InternalDistinctBag cannot be cast to org.apache.pig.data.Tuple at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:103) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:477) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:195) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:290) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:436) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:404) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:384) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:252) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Thejas M Nair
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development