[HADOOP-6835] Support concatenated gzip files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.20.2
Fix Version/s: 0.22.0
Component/s: io
Labels:
None

Hadoop Flags:

Incompatible change, Reviewed
Release Note:
Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/"member"; now processing will continue to the end of the concatenated stream, like gzip(1) does. (bzip2 support is unaffected by this patch.)

Description

When running MapReduce with concatenated gzip files as input only the first part is read, which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

C6835-9.patch
07/Jul/10 23:25
49 kB
Christopher Douglas
grr-hadoop-common.dif.20100614c
15/Jun/10 03:39
37 kB
Greg Roelofs
grr-hadoop-mapreduce.dif.20100614c
15/Jun/10 03:39
25 kB
Greg Roelofs
HADOOP-6835.v3.yahoo-0.20.2xx-branch.patch
25/Jun/10 18:40
82 kB
Greg Roelofs
HADOOP-6835.v4.trunk-hadoop-common.patch
29/Jun/10 03:26
18 kB
Greg Roelofs
HADOOP-6835.v4.trunk-hadoop-mapreduce.patch
29/Jun/10 03:26
47 kB
Greg Roelofs
HADOOP-6835.v4.yahoo-0.20.2xx-branch.patch
25/Jun/10 20:06
83 kB
Greg Roelofs
HADOOP-6835.v5.trunk-hadoop-common.patch
01/Jul/10 01:37
20 kB
Greg Roelofs
HADOOP-6835.v6.trunk-hadoop-common.patch
07/Jul/10 00:59
41 kB
Greg Roelofs
HADOOP-6835.v7.trunk-hadoop-common.patch
07/Jul/10 01:59
41 kB
Greg Roelofs
HADOOP-6835.v8.trunk-hadoop-common.patch
07/Jul/10 03:49
47 kB
Greg Roelofs
HADOOP-6835.v9.yahoo-0.20.2xx-branch.patch
08/Jul/10 02:13
46 kB
Greg Roelofs
MR-469.v2.yahoo-0.20.2xx-branch.patch
22/Jun/10 04:01
69 kB
Greg Roelofs

Issue Links

blocks

HADOOP-7386 Support concatenated bzip2 files

Resolved

is related to

HADOOP-6335 Support reading of concatenated gzip and bzip2 files

Resolved

PIG-42 Pig should be able to split Gzip files like it can split Bzip files

Resolved

relates to

MAPREDUCE-1795 add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)

Resolved

Activity

People

Assignee:: Greg Roelofs

Reporter:: Thomas White

Votes:: 2 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 12/Jan/09 13:16

Updated:: 12/Dec/11 06:19

Resolved:: 07/Jul/10 23:23