Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-686

PERFORMANCE: improve how data is stored between M-R jobs and between Map and Reduce

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.2.0
    • None
    • None
    • None

    Description

      Currently, there is quite a bit of overhead in how the data is serialized in both cases because a type information is stored with each field.

      However, most of the time the data has known and consistent schema in which case, it is sufficient to store the schema once.

      This change could really decrease the ammount of intermediate data generated.

      Attachments

        Activity

          People

            Unassigned Unassigned
            olgan Olga Natkovich
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: