[HADOOP-13192] org.apache.hadoop.util.LineReader cannot handle multibyte delimiters correctly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.6.2
Fix Version/s: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
Component/s: util
Labels:
None

Hadoop Flags:

Reviewed

Description

org.apache.hadoop.util.LineReader.readCustomLine() has a bug,
when line is aaaabccc, recordDelimiter is aaab, the result should be a,ccc,
show the code on line 310:
for (; bufferPosn < bufferLength; ++bufferPosn) {
if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
delPosn++;
if (delPosn >= recordDelimiterBytes.length)

{ bufferPosn++; break; }
} else if (delPosn != 0) { bufferPosn--; delPosn = 0; }
}

shoud be :
for (; bufferPosn < bufferLength; ++bufferPosn) {
if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
delPosn++;
if (delPosn >= recordDelimiterBytes.length) { bufferPosn++; break; }

} else if (delPosn != 0)

{ // ------------- change here ------------- start ---- bufferPosn -= delPosn; // ------------- change here ------------- end ---- delPosn = 0; }

}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-13192.final.patch
18/Jul/16 18:19
5 kB
Akira Ajisaka
0002-fix-bug-hadoop-1392-add-test-case-for-LineReader.patch
17/Jun/16 05:58
4 kB
devinzhu
0001-HADOOP-13192-org.apache.hadoop.util.LineReader-match.patch
17/Jun/16 05:58
1 kB
devinzhu

Issue Links

relates to

MAPREDUCE-5948 org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well

Closed

links to

GitHub Pull Request #99

Activity

People

Assignee:: devinzhu

Reporter:: devinzhu

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 23/May/16 11:04

Updated:: 04/Dec/17 02:37

Resolved:: 20/Jun/16 08:14

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified