[IO-354] Commons IO Tailer does not respect UTF-8 Charset - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3
Fix Version/s: 2.5
Component/s: Utilities
Labels:
- Charset
- Encoding
- Tailer
Environment:

JDK 7
RHEL Linux
Apache Commons IO version 2.4

Description

I just realized there is a defect in the source code of "org.apache.commons.io.input.Tailer.java". Basically, the current implementation does not work for multi-byte encoded files. See the following snippet,

448 private long readLines(RandomAccessFile reader) throws IOException {
449 StringBuilder sb = new StringBuilder();
450
451 long pos = reader.getFilePointer();
452 long rePos = pos; // position to re-read
453
454 int num;
455 boolean seenCR = false;
456 while (run && ((num = reader.read(inbuf)) != -1)) {
457 for (int i = 0; i < num; i++) {
458 byte ch = inbuf[i];
459 switch (ch) {
460 case '\n':
461 seenCR = false; // swallow CR before LF
462 listener.handle(sb.toString());
463 sb.setLength(0);
464 rePos = pos + i + 1;
465 break;
466 case '\r':
467 if (seenCR)

{ 468 sb.append('\r'); 469 }

470 seenCR = true;
471 break;
472 default:
473 if (seenCR)

{ 474 seenCR = false; // swallow final CR 475 listener.handle(sb.toString()); 476 sb.setLength(0); 477 rePos = pos + i + 1; 478 }

479 sb.append((char) ch); // add character, not its ascii value
480 }
481 }
482
483 pos = reader.getFilePointer();
484 }
485
486 reader.seek(rePos); // Ensure we can re-read if necessary
487 return rePos;
488 }

At line 479, the conversion of byte to char type breaks the encoding.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Tailer-commonsio-354.patch
10/Apr/13 00:07
6 kB
Peter Liu

Issue Links

is related to

IO-377 Tailer uses default charset to read the file

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Liyu Yi

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 27/Oct/12 00:08

Updated:: 08/Nov/16 17:59

Resolved:: 16/May/13 13:33