The attached patch is a first cut at adding this support.
Note that it changes the TupleFactory interface by adding a couple new methods for creating optimized tuples.
Two flavors of optimized tuples are provided:
1) For single-field tuple, we provide a PrimitiveFieldTuple, which simply wraps a primitive value (or a string).
2) For multi-field tuples, we provide an implementation that uses a single bytebuffer to hold the data in memory, and deserializes the appropriate field on read. This incurs a bit of a read-time penalty, but I believe it's a good trade-off, since (a) most of the time we only read once, and the allocation costs are much lower than for regular tuples, and (b) the memory overhead is several times lower than for regular tuples, so we'll save on GC.
Microbenchmark results can be found in the javadoc for PrimitiveTuple.
Note that so far I haven't changed any behavior in existing Pig code, other than changing one interface. The next step would be to start using these Tuples when possible.
One complication is that since we don't push much metadata around with tuples, we can only deserialize them into standard tuples; so all savings are lost once we hit an MR boundary. Changing this would require a pretty significant refactor, I'd love to hear ideas from folks who worked on BinInterSedes on how to do this.
So far, I've played with using these in some UDFs that generate large bags of tuples, and the difference in both speed and memory use if fairly dramatic.