Details
Description
This is the follow up for MAPREDUCE-6549. Compressed files cause record duplications as shown in different junit tests. The number of duplicated records changes with the splitsize:
Unexpected number of records in split (splitsize = 10)
Expected: 41051
Actual: 45062
Unexpected number of records in split (splitsize = 100000)
Expected: 41051
Actual: 41052
Test passes with splitsize = 147445 which is the compressed file length.The file is a bzip2 file with 100k blocks and a total of 11 blocks
Attachments
Attachments
Issue Links
- is related to
-
MAPREDUCE-6549 multibyte delimiters with LineRecordReader cause duplicate records
- Closed