Description
The following validates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10)
pig -c debug2.pig
Script: debug2.pig
A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo';
Error:
ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigServer.explain(PigServer.java:999) at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) at org.apache.pig.Main.run(Main.java:600) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) at org.apache.pig.PigServer.compilePp(PigServer.java:1322) at org.apache.pig.PigServer.explain(PigServer.java:984) ... 10 more Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 13 more