Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
When accessing a non-existing column after getting the columns with a streaming command, I got the following error which is not quite meaningfule:
[main] ERROR org.apache.pig.tools.grunt.Grunt -
Here is the sample Pig script that I used. The data file has only 3 tab seperate fields so the streaming command on line 3 generates tuples with 3 columns. On line 4 the script tries to access the 4th column which does not exist and thus the error above occurs.
A = load 'data; B = foreach A generate $2, $1, $0; C = stream B through `awk 'BEGIN {FS = "\t"; OFS = "\t"} {print $3, $2, $1}'`; D = foreach C generate $4; store D into 'results';
Here is what happens on my machine:
grunt> A = load 'data'; grunt> B = foreach A generate $2, $1, $0; grunt> stream B through `awk 'BEGIN {FS = "\t"; OFS = "\t"} {print $3, $2, $1}'`; grunt> D = foreach C generate $4; 2008-03-11 18:53:00,376 [main] ERROR org.apache.pig.tools.grunt.GruntParser - grunt>
A related note is that Pig behaves differently if there is no streaming command in the Pig script. In this case, an IndexOutOfBoundsException exception is generated at runtime.
grunt> A = load 'data'; grunt> B = foreach A generate $2, $1, $0; grunt> D = foreach B generate $4; grunt> store D into 'results'; 2008-03-11 18:57:40,107 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce Job ----- 2008-03-11 18:57:40,108 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: [data:org.apache.pig.builtin.PigStorage()] 2008-03-11 18:57:40,109 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]->GENERATE {[PROJECT $2],[PROJECT $1],[PROJECT $0]}->GENERATE {[PROJECT $4]}] 2008-03-11 18:57:40,109 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null 2008-03-11 18:57:40,110 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null 2008-03-11 18:57:40,110 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null 2008-03-11 18:57:40,111 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output: results:org.apache.pig.builtin.PigStorage 2008-03-11 18:57:40,111 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null 2008-03-11 18:57:40,112 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism: -1 2008-03-11 18:57:40,112 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce parallelism: -1 2008-03-11 18:57:42,391 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher - Pig progress = 0% 2008-03-11 18:57:59,466 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher - Error message from task (map) tip_200802211201_1494_m_000000 java.lang.IndexOutOfBoundsException: Requested index 4 from tuple (2.93, 21, rachel ovid) at org.apache.pig.data.Tuple.getField(Tuple.java:153) at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84) at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:261) at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.add(GenerateSpec.java:230) at org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:52) at org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93) at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:261) at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) java.lang.IndexOutOfBoundsException: Requested index 4 from tuple (2.93, 21, rachel ovid) at org.apache.pig.data.Tuple.getField(Tuple.java:153) at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84) at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:261) at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.add(GenerateSpec.java:230) at org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:52) at org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93) at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:261) at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) java.lang.IndexOutOfBoundsException: Requested index 4 from tuple (2.93, 21, rachel ovid) at org.apache.pig.data.Tuple.getField(Tuple.java:153) at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84) at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:261) at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.add(GenerateSpec.java:230) at org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:52) at org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93) at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:261) at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) java.lang.IndexOutOfBoundsException: Requested index 4 from tuple (2.93, 21, rachel ovid) at org.apache.pig.data.Tuple.getField(Tuple.java:153) at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84) at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:261) at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.add(GenerateSpec.java:230) at org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:52) at org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93) at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:261) at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) 2008-03-11 18:57:59,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher - Error message from task (reduce) tip_200802211201_1494_r_000000 2008-03-11 18:57:59,470 [main] ERROR org.apache.pig.tools.grunt.GruntParser - Unable to store alias D