There are a couple issues with this approach, actually most of issues are not specific to AvroStorage, it is how we deal with UDF dependent jars:
1. Pig don't automatically ship all classes in pig-withouthadoop.jar
We also need to make code change in JarManager.jar to denote the package to ship. Putting a jar into pig-withouthadoop.jar alone is equal to put this jar in classpath. This mechanism confusing and we shall stop putting more jars into pig-withouthadoop.jar
2. Conflict with hadoop bundled jars
Hadoop 20.204 bundles jackson-1.0.1, which is too old for AvroLoader. In frontend, we can force hadoop take our jackson-1.7.3 by setting flag HADOOP_USER_CLASSPATH_FIRST=true. But in the backend, seems hadoop always pick bundled jackson-1.0.1, which results a job failure.
3. Do we need to bundle piggybank dependent jars?
We don't even bundle hbase.jar though HbaseLoader is in builtin. Further, these jars are not even in Pig distribution. They are ivy dependencies and will only be retrieved during compilation. My thinking is we need to bundle some popular jars (hbase.jar, avro.jar, etc) in lib so user knows where to find it when needed. But we don't want to ship all those jars to the backend. Ideally Pig should be smart enough to ship jars when needed (as we do for jython.jar)