Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.20.2
-
None
-
Incompatible change, Reviewed
-
Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/"member"; now processing will continue to the end of the concatenated stream, like gzip(1) does. (bzip2 support is unaffected by this patch.)
Description
When running MapReduce with concatenated gzip files as input only the first part is read, which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)
Attachments
Attachments
Issue Links
- blocks
-
HADOOP-7386 Support concatenated bzip2 files
- Resolved
- is related to
-
HADOOP-6335 Support reading of concatenated gzip and bzip2 files
- Resolved
-
PIG-42 Pig should be able to split Gzip files like it can split Bzip files
- Resolved
- relates to
-
MAPREDUCE-1795 add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)
- Resolved