Pig
  1. Pig
  2. PIG-3017

Pig's object serialization should use compression

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      We have run into cases of very large JobConf objects, and part of this is the fact that serialized objects are quite large. There is no reason not to use compression here, and ratios should be quite high.

      1. PIG-3017-0.patch
        32 kB
        Jonathan Coveney

        Activity

        Hide
        Jonathan Coveney added a comment -

        This passes test-commit, though I had to change one golden file because this serialization affect results (golden files shakes fist).

        Show
        Jonathan Coveney added a comment - This passes test-commit, though I had to change one golden file because this serialization affect results (golden files shakes fist ).
        Hide
        Prashant Kommireddi added a comment -

        Hey Jon, out of curiosity - have you done any comparison between object sizes before and after this patch, and also comparisons w.r.t time? Just trying to understand if Deflater.BEST_COMPRESSION is the ideal choice.

        Show
        Prashant Kommireddi added a comment - Hey Jon, out of curiosity - have you done any comparison between object sizes before and after this patch, and also comparisons w.r.t time? Just trying to understand if Deflater.BEST_COMPRESSION is the ideal choice.
        Hide
        Jonathan Coveney added a comment -

        Well, I don't know the absolute size because I had a script where the JobConf was failing out at about 6.5MB...I'm not sure if it fails as soon as it crosses the thresh-hold, or if it fails after serializing everything. That said, after this patch, the same JobConf was 600KB, so about 10x (note that I also changed it to use Base64 encoding). Also, as far as serialization time, it's still in the realm of ~5MB, so compression time is negligible. I did not do extensive testing around the specifics, though.

        Show
        Jonathan Coveney added a comment - Well, I don't know the absolute size because I had a script where the JobConf was failing out at about 6.5MB...I'm not sure if it fails as soon as it crosses the thresh-hold, or if it fails after serializing everything. That said, after this patch, the same JobConf was 600KB, so about 10x (note that I also changed it to use Base64 encoding). Also, as far as serialization time, it's still in the realm of ~5MB, so compression time is negligible. I did not do extensive testing around the specifics, though.
        Hide
        Prashant Kommireddi added a comment -

        Sounds reasonable, thanks.

        Show
        Prashant Kommireddi added a comment - Sounds reasonable, thanks.
        Hide
        Julien Le Dem added a comment -

        +1 looks good to me

        Show
        Julien Le Dem added a comment - +1 looks good to me
        Hide
        Jonathan Coveney added a comment -

        Added to 0.11 and trunk, thanks Julien

        Show
        Jonathan Coveney added a comment - Added to 0.11 and trunk, thanks Julien

          People

          • Assignee:
            Jonathan Coveney
            Reporter:
            Jonathan Coveney
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development