[HADOOP-7076] Splittable Gzip - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Later
Affects Version/s: None
Fix Version/s: None
Component/s: io
Labels:
None

Target Version/s:

0.23.2
Release Note:
Make Gzipped input splittable by offering a tradeoff between "Spent resources" and "Wall clock time"

Description

Files compressed with the gzip codec are not splittable due to the nature of the codec.
This limits the options you have scaling out when reading large gzipped input files.

Given the fact that gunzipping a 1GiB file usually takes only 2 minutes I figured that for some use cases wasting some resources may result in a shorter job time under certain conditions.
So reading the entire input file from the start for each split (wasting resources!!) may lead to additional scalability.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-7076.patch
06/Jan/11 21:24
40 kB
Niels Basjes
HADOOP-7076-2011-01-26.patch
26/Jan/11 23:34
40 kB
Niels Basjes
HADOOP-7076-2011-01-29.patch
29/Jan/11 22:54
41 kB
Niels Basjes
HADOOP-7076-2011-02-05.patch
05/Feb/11 14:03
42 kB
Niels Basjes
HADOOP-7076-2011-02-06.patch
06/Feb/11 22:19
42 kB
Niels Basjes
HADOOP-7076-2011-05-18.patch
18/May/11 12:41
43 kB
Niels Basjes
HADOOP-7076-2011-08-05-2255.patch
05/Aug/11 20:58
6 kB
Niels Basjes
HADOOP-7076-2011-08-05-2315.patch
05/Aug/11 21:16
43 kB
Niels Basjes
HADOOP-7076-2011-12-04-2332.patch
04/Dec/11 22:45
40 kB
Niels Basjes
HADOOP-7076-branch-0.22.patch
05/Dec/11 22:59
40 kB
Niels Basjes
HADOOP-7076-2011-12-09.patch
09/Dec/11 21:10
41 kB
Niels Basjes
HADOOP-7076-2011-12-09-branch-0.22.patch
09/Dec/11 21:32
40 kB
Niels Basjes

Issue Links

is related to

HADOOP-7909 Implement a generic splittable signature-based compression format

Open

HADOOP-6153 RAgzip: multiple map tasks for a large gzipped file

Resolved

SPARK-29102 Read gzipped file into multiple partitions without full gzip expansion on a single-node

Resolved

Activity

People

Assignee:: Niels Basjes

Reporter:: Niels Basjes

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 23/Dec/10 10:09

Updated:: 19/Sep/19 00:46

Resolved:: 05/Apr/12 21:23