Description
Consider the following pig script which contains a UDF known as MYUDF. MYUDF is a dummy UDF which takes in a Bag E and a set of integers as offsets.
register myudf.jar; A = load 'visits.txt' using PigStorage() as ( name:chararray, url:chararray, timestamp:chararray); B = filter A by ( (name is not null) AND (timestamp is not null) ); C = group B by ( url ); D = foreach C { E = order B by timestamp; generate E; } G = foreach D generate param.MYUDF(E, -1, 0, 1); --this works --param.MYUDF(E,'-1','0','1'); explain G; dump G;
If you execute the above script, it fails during the reducer phase where the POUserFunc(MYUDF)[bag] is being called. The MYUDF code is infact not called but somehow the parameters passed to the MYUDF cause the exception in the reduce plan. If you replace the offsets -1,0,1 with '-1', '0', '1' (strings) the UDF seems to get called and the script works fine.
=============================================================================================================================
java.io.IOException: Received Error while processing the reduce plan.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:307)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
=============================================================================================================================