Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Pig + Hadoop 17
Description
A = load '...' USING PigStorage('\t') AS (c1, c2, c3, n1);
B = group A by (c1,c2,c3);
C = foreach B generate flatten(group), SUM(A.n1);
store C into ...;
Runs with combiner and errors out.
java.io.IOException: For input string: "..." additional info: iteration = 1bag size = 2 partial sum = 0.0
previous tupple = (...)
at org.apache.pig.builtin.SUM.sum(SUM.java:95)
at org.apache.pig.builtin.SUM$Final.exec(SUM.java:63)
at org.apache.pig.builtin.SUM$Final.exec(SUM.java:60)
at org.apache.pig.impl.eval.FuncEvalSpec$1.add(FuncEvalSpec.java:116)
at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.<init>(GenerateSpec.java:159)
at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:79)
at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.reduce(PigMapReduce.java:165)
at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.reduce(PigMapReduce.java:80)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
Work-around was that I put out combiner:
C = foreach B generate SUM(A.n1),flatten(group);
and it worked. Input data has some private information in it so I cannot post it. Let me know if it was not possible to solve it without having it. Then we compile a similar input.
c1,c2,c3 are alphabetic,
n1 is numeric.