Pig
  1. Pig
  2. PIG-1474

Avoid serialization/deserialization costs for PigStorage data - Use custom Tuple

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Avoid sedes when possible for data loaded using PigStorage by implementing approach #4 proposed in http://wiki.apache.org/pig/AvoidingSedes .

      The write() and readFields() functions of tuple returned by TupleFactory is used to serialize data between Map and Reduce. By using a tuple that knows the serialization format of the loader, we avoid sedes at Map Recue boundary and use the load functions serialized format between Map and Reduce .
      To use a new custom tuple for this purpose, a custom TupleFactory that returns tuples of this type has to be specified using the property "pig.data.tuple.factory.name" .
      This approach will work only for a set of load functions in the query that share same serialization format for map and bags. If this approach proves to be very useful, it will build a case for more extensible approach.

        Activity

        Hide
        Thejas M Nair added a comment -

        Unlinking from 0.8 release.
        I was planning to use the lazy implementations of Map and Bag for this that were proposed in PIG-1473. Those objects would have had a copy of the seralized versions of map and bag. But the plan in the jira had to be abandoned for reasons mentioned there. A different approach is required to solve the issue.

        Show
        Thejas M Nair added a comment - Unlinking from 0.8 release. I was planning to use the lazy implementations of Map and Bag for this that were proposed in PIG-1473 . Those objects would have had a copy of the seralized versions of map and bag. But the plan in the jira had to be abandoned for reasons mentioned there. A different approach is required to solve the issue.

          People

          • Assignee:
            Thejas M Nair
            Reporter:
            Thejas M Nair
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development