Pig
  1. Pig
  2. PIG-42

Pig should be able to split Gzip files like it can split Bzip files

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: impl
    • Labels:
      None

      Description

      It would be nice to be able to split gzip files like we can split bzip files. Unfortunately, we don't have a sync point for the split in the gzip format.

      Gzip file format supports the notion of concatenate gzipped files. When gzipped files are concatenated together they are treated as a single file. So to make a gzipped file splittable we can used an empty compressed file with some salt in the headers as a sync signature. Then we can make the gzip file splittable by using this sync signature between compressed segments of the file.

      1. gzip.patch
        15 kB
        Benjamin Reed

        Issue Links

          Activity

          Benjamin Reed created issue -
          Benjamin Reed made changes -
          Field Original Value New Value
          Attachment gzip.patch [ 12370765 ]
          Olga Natkovich made changes -
          Assignee Benjamin Reed [ breed ]
          Olga Natkovich made changes -
          Patch Info [Patch Available]
          Owen O'Malley made changes -
          Workflow jira [ 12418372 ] no-reopen-closed, patch-avail [ 12425423 ]
          Tom White made changes -
          Link This issue relates to HADOOP-5014 [ HADOOP-5014 ]
          Olga Natkovich made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Greg Roelofs made changes -
          Link This issue relates to MAPREDUCE-1795 [ MAPREDUCE-1795 ]

            People

            • Assignee:
              Benjamin Reed
              Reporter:
              Benjamin Reed
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development