Uploaded image for project: 'Commons IO'
  1. Commons IO
  2. IO-354

Commons IO Tailer does not respect UTF-8 Charset

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3
    • Fix Version/s: 2.5
    • Component/s: Utilities
    • Environment:

      JDK 7
      RHEL Linux
      Apache Commons IO version 2.4

      Description

      I just realized there is a defect in the source code of "org.apache.commons.io.input.Tailer.java". Basically, the current implementation does not work for multi-byte encoded files. See the following snippet,

      448 private long readLines(RandomAccessFile reader) throws IOException {
      449 StringBuilder sb = new StringBuilder();
      450
      451 long pos = reader.getFilePointer();
      452 long rePos = pos; // position to re-read
      453
      454 int num;
      455 boolean seenCR = false;
      456 while (run && ((num = reader.read(inbuf)) != -1)) {
      457 for (int i = 0; i < num; i++) {
      458 byte ch = inbuf[i];
      459 switch (ch) {
      460 case '\n':
      461 seenCR = false; // swallow CR before LF
      462 listener.handle(sb.toString());
      463 sb.setLength(0);
      464 rePos = pos + i + 1;
      465 break;
      466 case '\r':
      467 if (seenCR)

      { 468 sb.append('\r'); 469 }

      470 seenCR = true;
      471 break;
      472 default:
      473 if (seenCR)

      { 474 seenCR = false; // swallow final CR 475 listener.handle(sb.toString()); 476 sb.setLength(0); 477 rePos = pos + i + 1; 478 }

      479 sb.append((char) ch); // add character, not its ascii value
      480 }
      481 }
      482
      483 pos = reader.getFilePointer();
      484 }
      485
      486 reader.seek(rePos); // Ensure we can re-read if necessary
      487 return rePos;
      488 }

      At line 479, the conversion of byte to char type breaks the encoding.

        Attachments

        1. Tailer-commonsio-354.patch
          6 kB
          Peter Liu

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                liyuyi Liyu Yi
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: