Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.2.0
-
None
-
None
Description
Currently each Tuple read in by Pig is wrapped into a TargetedTuple which has an attribute holding a list of operator keys corresponding to the root operators for which the tuple is targeted. For example in a cogroup query the tuple would be destined for one of the two roots of the plan depending on which input it is sourced from. This information is contained in the TargetedTuple. However this adds unnecessary overhead at load time in a map as for each tuple this extra list needs to be attached and also on entry into the map(), the operators corresponding to the operator keys in the list need to be looked up in the map plan.
This overhead can be eliminated by just serializing this list of target operators at the Record Reader level and then deserializing the list in the configure() of the map(). After deserialization, the actual operators corresponding to the operator keys can also be looked up in the configure() itself. This way this setup is done one time in the configure() rather than adding extra overhead to each input tuple and each map() call.