Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.9.0, 0.10.0
-
None
-
None
-
None
-
Patch Available
Description
When running illustrate on this script:
S1 = LOAD 'excite.log.bz2' AS (user_id:chararray, timestamp:chararray, query:chararray); S2 = LOAD 'excite.log.bz2' AS (user_id:chararray, timestamp:chararray, query:chararray); C = cogroup S1 BY user_id INNER, S2 by user_id INNER; D = foreach C generate group, flatten(S1.timestamp) as t1, flatten(S2.timestamp) as t2; STORE D INTO 'output';
I get this exception:
2011-11-03 20:49:13,577 [main] ERROR org.apache.pig.PigServer - java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.illustratorMarkup2(POForEach.java:683) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.createTuple(POForEach.java:458) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:404) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POOptimizedForEach.getNext(POOptimizedForEach.java:124) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POJoinPackage.getNext(POJoinPackage.java:222) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:416) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:1) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:215) ...
The casting error itself is easy to fix (by checking for type) - but that results in the tracked lineages missing tuples. The method POForEach.illustratorMarkup2 seems to be based on the incorrect assumption that Object[] is an array of tuples instead of an array of fields for the output tuple and its not clear to me if this function needs to be modified or if there's a bigger problem somewhere else in the code.