Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5309

Problem with tez + union + replicated join

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.18.0, 0.17.1
    • Component/s: tez
    • Labels:
      None

      Description

      I've been using Pig 0.12.1 for quite some time and am finally upgrading to 0.17. One of my existing scripts failed. I have a workaround (SET pig.tez.opt.union false), but I thought I'd pass on the problem I observed.

      In stdout:

      ERROR 2017: Internal error creating job configuration.
      

      In the Pig log:

      Caused by: java.lang.IllegalArgumentException: Edge [scope-93 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> [scope-83 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined!
      	at org.apache.tez.dag.api.DAG.addEdge(DAG.java:272)
      	at org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:404)
      	at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:259)
      	at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
      	at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
      	at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
      	at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
      	at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
      	... 20 more
      

      I played around with a minimum viable test script and can cause this to fail:

      weblogs = LOAD '/tmp/in/weblogInfo' as (path:chararray, queryMap:map[chararray]); 
      featureToExtraData = LOAD '/tmp/in/featureToExtraData' as (feature:chararray, extraData:chararray); 
      
      oldA = FILTER weblogs BY path == '/A';
      newA = FILTER weblogs BY path == '/somethingElse';
      B = FILTER weblogs BY path == '/B';
      
      oldAFeatures = FOREACH oldA GENERATE queryMap#'feature1' as feature1, queryMap#'feature2' as feature2;
      newAFeatures = FOREACH newA GENERATE queryMap#'different1' as feature1, queryMap#'different2' as feature2;
      AFeatures = UNION oldAFeatures, newAFeatures;
      AFeaturesPlusMore = JOIN AFeatures BY feature1 LEFT, featureToExtraData BY feature USING 'replicated';
      
      BFeatures = FOREACH B GENERATE queryMap#'somethingElseEntirely1' as feature1, queryMap#'somethingElseEntirely2' as feature2;
      BFeaturesPlusMore = JOIN BFeatures BY feature1 LEFT, featureToExtraData BY feature USING 'replicated';
      
      STORE AFeaturesPlusMore INTO '/tmp/out/1/AFeaturesPlusMore';
      STORE BFeaturesPlusMore INTO '/tmp/out/1/BFeaturesPlusMore';
      

        Attachments

          Activity

            People

            • Assignee:
              rohini Rohini Palaniswamy
              Reporter:
              oberman Will Oberman
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: