[PIG-42] Pig should be able to split Gzip files like it can split Bzip files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: impl
Labels:
None

Description

It would be nice to be able to split gzip files like we can split bzip files. Unfortunately, we don't have a sync point for the split in the gzip format.

Gzip file format supports the notion of concatenate gzipped files. When gzipped files are concatenated together they are treated as a single file. So to make a gzipped file splittable we can used an empty compressed file with some salt in the headers as a sync signature. Then we can make the gzip file splittable by using this sync signature between compressed segments of the file.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

gzip.patch
01/Dec/07 22:18
15 kB
Benjamin Reed

Issue Links

relates to

MAPREDUCE-1795 add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)

Resolved

HADOOP-6835 Support concatenated gzip files

Closed

Activity

People

Assignee:: Benjamin Reed

Reporter:: Benjamin Reed

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/Dec/07 22:15

Updated:: 18/May/10 01:18

Resolved:: 30/Apr/10 00:43