Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.0.4-alpha, 0.23.8
-
None
-
None
-
-
Reviewed
Description
Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them in splits based on where record delimiters occur relative to compression block boundaries.
Thanks to Koji Noguchi for discovering this problem while working on PIG-3251.
Attachments
Attachments
Issue Links
- is duplicated by
-
MAPREDUCE-5143 TestLineRecordReader has no test case for compressed files
- Resolved
- relates to
-
MAPREDUCE-5948 org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
- Closed
-
PIG-3251 Bzip2TextInputFormat requires double the memory of maximum record size
- Closed