Affects Version/s: 0.10.0
Fix Version/s: None
pig-0.10.1, hadoop 0.20.2
I started using embedded Pig in Python scripts. I had a need to execute a pig script with slightly different set of parameters for each run.
The job are quite small so taking advantage of the cluster and running them in parallel made sense for me.
Here's a python code I've used. (I executed it like that: bin/pig run.py script.pig ):
With NUM_OF_JOBS_TO_RUN_AT_ONCE variable I'm able to control the parallelism.
I can have up to 150 parameter sets so that means 150 pig executions.
Everything seemed to work just fine but I started noticing single failures for some job executions.
It happens occasionally. 0-5 executions fail out of 150 for example. Always with the same kind of error.
Full stacktrace attached.
I'm using many UDFs so the name of the UDF in the exception is changing.
I suspect there is a threading issue somewhere.
My best guess is that org.apache.pig.impl.PigContext.resolveClassName is not thread safe and when multiple threads are trying to resolve a UDF class something goes wrong.
I've tried a couple of tricks hoping that maybe it would help. What I did is that to my knowledge there are 3 ways in how you can register your jars with udfs.
- in pig script ( REGISTER lib/*.jar
- in python Pig.registerJar("/lib/*.jar")
- command line param for pig command, $PIGDIR/bin/pig -Dpig.additional.jars=lib/*.jar
Initially the 1) option was used. I was thinking that maybe if I register the jars globally right at the beginning with the option 3) I could go around the bug. Well it seems the problem dropped but didn't go away fully and still appears from time to time.
The problem is that I cannot provide an reproducible use case. My process is quite complicated and presenting it here seems infeasible. I've tried to strip down my scripts and have something quick and simple to present. I've run that with like 1000 parameter sets with parallelism set to 10 or 20 and it sadly never occurred.
With pig-0.10.1 I had to substitute the distributed jython dependency with a standalone version. Otherwise I wasn't able to use python standard modules.
I couldn't try if this bug still exists in pig-0.11.0 as the version is incompatible with hadoo 0.20. pig-0.11.1 has not been released yet.