Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.3, 2.0.0-alpha
-
Reviewed
Description
ResetableGzipOutputStream creates invalid gzip files when finish() and resetState() are used. The issue is that finish() flushes the compressor buffer and writes the gzip CRC32 + data length trailer. After that, resetState() does not repeat the gzip header, but simply starts writing more deflate-compressed data. The resultant files are not readable by the Linux "gunzip" tool. ResetableGzipOutputStream should write valid multi-member gzip files.
The gzip format is specified in RFC 1952.
Attachments
Attachments
Issue Links
- blocks
-
FLUME-2967 Corrupted gzip files generated when writting to S3
- Open
- is duplicated by
-
HADOOP-8625 Use GzipCodec to decompress data in ResetableGzipOutputStream test
- Resolved
-
HADOOP-6799 GzipCodec/CompressionOutputStream resetState() fails to reset gzip header and CRC
- Resolved