Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.4.0
-
None
Description
the bug is in ResettableFileInputStream.java: int readChar().
if the last byte of buf is only a partial of a wide character, readChar() shouldn't return -1(ResettableFileInputStream.java:186). it
loses the remanent data in a file.
I fix it such as:
public synchronized int readChar() throws IOException {
// if (!buf.hasRemaining()) {
if(buf.limit()- buf.position < 10)
int start = buf.position();
charBuf.clear();
boolean isEndOfInput = false;
if (position >= fileSize)
CoderResult res = decoder.decode(buf, charBuf, isEndOfInput);
if (res.isMalformed() || res.isUnmappable())
int delta = buf.position() - start;
charBuf.flip();
if (charBuf.hasRemaining())
else
{ incrPosition(delta, false); return -1; }}
it avoid a partial character, but have new issue. sometime, some lines of a log file have a repeated character.
eg.
original file: 123456
sink file: 1233456
Attachments
Attachments
Issue Links
- duplicates
-
FLUME-2215 ResettableFileInputStream can't support ucs-4 character
- Closed
- is duplicated by
-
FLUME-2241 Spooling Directory Source doesn't handle 2 byte UTF-8 encoded characters correctly
- Closed
- relates to
-
FLUME-2241 Spooling Directory Source doesn't handle 2 byte UTF-8 encoded characters correctly
- Closed