Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.10.1
-
None
-
None
-
None
Description
If a bz2 block boundary occurs in the middle of a record that is terminated by a carriage-return then the next record will be duplicated. The compressed stream position is updated at the same time a carriage-return character is seen without a subsequent line-feed character. Based on the method of reporting position within the compression stream, it incorrectly believes it has read only the carriage-return character into the next compression block and ends up processing the next record which will also be processed by the consumer of the next split.
Attachments
Attachments
Issue Links
- is related to
-
PIG-3251 Bzip2TextInputFormat requires double the memory of maximum record size
- Closed