Uploaded image for project: 'Commons IO'
  1. Commons IO
  2. IO-638

Infinite loop in CharSequenceInputStream.read for 4-byte characters with UTF-8 and 3-byte buffer.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.6
    • 2.12.0
    • Streams/Writers
    • None

    Description

      In the constructor of `CharSequenceInputStream` there is the following code to ensure the buffer is large enough to hold one character:

       // Ensure that buffer is long enough to hold a complete character   
      final float maxBytesPerChar = encoder.maxBytesPerChar();      
      if (bufferSize < maxBytesPerChar) {
          throw new IllegalArgumentException("Buffer size " + bufferSize + " is less than maxBytesPerChar " +
          maxBytesPerChar);
      }
      

      However, for UTF-8, `maxBytesPerChar` returns 3.0 not 4.0, even though some characters (such as emoji) require 4 bytes to encode.  As a result you can create a `CharSequenceInputStream` with a buffer size of 3, but when attempting to fill the buffer, `CharsetEncoder.encode` will succeed with an OVERFLOW result without actually writing anything to buffer if attempting to encode a 4 byte character. This in turn results in an infinite loop in read methods, since the buffer never actually gets anything written to it.

       

      NOTE: as I understand it, the reason the encoder returns 3 and not 4 is because 3 is the maximum number of byte that a single java `char` can represent, since a 4 byte encoding in UTF-8 would require two a surragate pair of two `char`s.

       

      This is may be a problem for other encodings as well, but I've only tested it for utf-8.

       

      Requiring the buffer to be at least twice the maxBytesPerChar would ensure this doesn't happen.

      Attachments

        Activity

          People

            Unassigned Unassigned
            thayne2 Thayne McCombs
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: