Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.8.0
-
None
-
None
-
None
Description
Avoid sedes when possible for data loaded using PigStorage by implementing approach #4 proposed in http://wiki.apache.org/pig/AvoidingSedes .
The write() and readFields() functions of tuple returned by TupleFactory is used to serialize data between Map and Reduce. By using a tuple that knows the serialization format of the loader, we avoid sedes at Map Recue boundary and use the load functions serialized format between Map and Reduce .
To use a new custom tuple for this purpose, a custom TupleFactory that returns tuples of this type has to be specified using the property "pig.data.tuple.factory.name" .
This approach will work only for a set of load functions in the query that share same serialization format for map and bags. If this approach proves to be very useful, it will build a case for more extensible approach.