Details
Description
I just realized there is a defect in the source code of "org.apache.commons.io.input.Tailer.java". Basically, the current implementation does not work for multi-byte encoded files. See the following snippet,
448 private long readLines(RandomAccessFile reader) throws IOException {
449 StringBuilder sb = new StringBuilder();
450
451 long pos = reader.getFilePointer();
452 long rePos = pos; // position to re-read
453
454 int num;
455 boolean seenCR = false;
456 while (run && ((num = reader.read(inbuf)) != -1)) {
457 for (int i = 0; i < num; i++) {
458 byte ch = inbuf[i];
459 switch (ch) {
460 case '\n':
461 seenCR = false; // swallow CR before LF
462 listener.handle(sb.toString());
463 sb.setLength(0);
464 rePos = pos + i + 1;
465 break;
466 case '\r':
467 if (seenCR)
470 seenCR = true;
471 break;
472 default:
473 if (seenCR)
479 sb.append((char) ch); // add character, not its ascii value
480 }
481 }
482
483 pos = reader.getFilePointer();
484 }
485
486 reader.seek(rePos); // Ensure we can re-read if necessary
487 return rePos;
488 }
At line 479, the conversion of byte to char type breaks the encoding.
Attachments
Attachments
Issue Links
- is related to
-
IO-377 Tailer uses default charset to read the file
- Closed