-
Type:
Bug
-
Status: Closed
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 2.6.2
-
Fix Version/s: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
-
Component/s: util
-
Labels:None
-
Hadoop Flags:Reviewed
org.apache.hadoop.util.LineReader.readCustomLine() has a bug,
when line is aaaabccc, recordDelimiter is aaab, the result should be a,ccc,
show the code on line 310:
for (; bufferPosn < bufferLength; ++bufferPosn) {
if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
delPosn++;
if (delPosn >= recordDelimiterBytes.length)
} else if (delPosn != 0) { bufferPosn--; delPosn = 0; }
}
shoud be :
for (; bufferPosn < bufferLength; ++bufferPosn) {
if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
delPosn++;
if (delPosn >= recordDelimiterBytes.length) { bufferPosn++; break; }
} else if (delPosn != 0)
{ // ------------- change here ------------- start ---- bufferPosn -= delPosn; // ------------- change here ------------- end ---- delPosn = 0; }}
- relates to
-
MAPREDUCE-5948 org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
-
- Closed
-
- links to