Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.0.0, 0.1.0, 0.2.0, site
-
None
-
None
-
Pig 0.1.1/Linux / 8 Nodes hadoop 0.18.2
Description
So I don't believe bzip2 input to pig is working, at least not with large files. It seems as though map files are getting cut off. The maps complete way too quickly and the actual row of data that pig tries to process often randomly gets cut, and becomes incomplete. Here are my symptoms:
- Maps seem to be completing in a unbelievably fast rate
With uncompressed data
Status: Succeeded
Started at: Wed Dec 17 21:31:10 EST 2008
Finished at: Wed Dec 17 22:42:09 EST 2008
Finished in: 1hrs, 10mins, 59sec
map 100.00%
4670 0 0 4670 0 0 / 21
reduce 57.72%
13 0 0 13 0 0 / 4
With bzip compressed data
Started at: Wed Dec 17 21:17:28 EST 2008
Failed at: Wed Dec 17 21:17:52 EST 2008
Failed in: 24sec
Black-listed TaskTrackers: 2
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed
Task Attempts
map 100.00%
183 0 0 15 168 54 / 22
reduce 100.00%
13 0 0 0 13 0 / 0
The errors we get:
ava.lang.IndexOutOfBoundsException: Requested index 11 from tuple (rec A, 0HAW, CHIX, )
at org.apache.pig.data.Tuple.getField(Tuple.java:176)
at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84)
at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:38)
at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:223)
at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:60)
at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:117)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
Last 4KB
attempt_200812161759_0045_m_000007_0 task_200812161759_0045_m_000007 tsdhb06.factset.com FAILED
java.lang.IndexOutOfBoundsException: Requested index 11 from tuple (rec A, CSGN, VTX, VTX, 0, 20080303, 90919, 380, 1543, 206002)
at org.apache.pig.data.Tuple.getField(Tuple.java:176)
at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84)
at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:38)
at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:223)
at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:60)
at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:117)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)