Details
Description
LineRecorderReader currently produces duplicate records under certain scenarios such as:
1) input string: "abc++defghi+"
delimiter string: "+++"
test passes with all sizes of the split
2) input string: "abc+def+ghi+"
delimiter string: "+++"
test fails with a split size of 4
2) input string: "abc++defghi+"
delimiter string: "++"
test fails with a split size of 5
3) input string "abc++defghij+"
delimiter string: "++"
test fails with a split size of 4
4) input string "abc+def+ghi+"
delimiter string: "++"
test fails with a split size of 9
Attachments
Attachments
Issue Links
- is duplicated by
-
MAPREDUCE-6891 TextInputFormat: duplicate records with custom delimiter
- Resolved
- relates to
-
MAPREDUCE-6481 LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.
- Closed
-
MAPREDUCE-6558 multibyte delimiters with compressed input files generate duplicate records
- Closed