Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3020

"Duplicate uid in schema" error when joining two relations derived from the same load statement

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.11
    • 0.11, 0.12.0
    • None
    • None
    • Patch Available

    Description

      The following validates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10)

      pig -c debug2.pig

      Script: debug2.pig

      A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{});
      edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock);
      edges_both = FOREACH edges_both GENERATE
          group.uid AS src_id,
          group.dst_id AS dst_id;
      both_counts = GROUP edges_both BY src_id;
      both_counts = FOREACH both_counts GENERATE
          group AS src_id, SIZE(edges_both) AS size_both;
      
      edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
      edges_bq = FOREACH edges_bq GENERATE
          group.uid AS src_id,
          group.dst_id AS dst_id;
      bq_counts = GROUP edges_bq BY src_id;
      bq_counts = FOREACH bq_counts GENERATE
          group AS src_id, SIZE(edges_bq) AS size_bq;
      
      per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id;
      store per_user_set_sizes into  'foo';
      

      Error:

      ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
      
      org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null
      	at org.apache.pig.PigServer.explain(PigServer.java:999)
      	at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
      	at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
      	at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
      	at org.apache.pig.Main.run(Main.java:600)
      	at org.apache.pig.Main.main(Main.java:154)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
      Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter
      	at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
      	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
      	at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
      	at org.apache.pig.PigServer.explain(PigServer.java:984)
      	... 10 more
      Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
      	at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
      	at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
      	at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
      	at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
      	at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
      	at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
      	at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
      	... 13 more
      

      Attachments

        1. PIG-3020-2.patch
          10 kB
          Jonathan Coveney
        2. PIG-3020-2_ws.patch
          17 kB
          Jonathan Coveney
        3. PIG-3093-testcase.patch
          3 kB
          Jonathan Coveney
        4. PIG-3020_branch-0.11_1.patch
          11 kB
          Julien Le Dem
        5. PIG-3020.patch
          14 kB
          Julien Le Dem

        Issue Links

          Activity

            People

              jcoveney Jonathan Coveney
              julienledem Julien Le Dem
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: