Uploaded image for project: 'Apache Cordova'
  1. Apache Cordova
  2. CB-13570

FileReader#readAsText fails with multi-byte UTF-8 characters

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.0.0, 4.2.0
    • Fix Version/s: None
    • Component/s: cordova-plugin-file
    • Labels:
      None
    • Environment:

      Description

      `FileReader#readAsText` reads the file in chunks of 256KB. If the file contains a multi-byte UTF-8 character that is split into two separate chunks, reading fails with an encoding error (ENCODING_ERR: 5).

      For many apps this is not an issue. However, if I file is larger than 256KB and contains many multi-byte characters, this is likely to happen.

      I have not experienced this issue on Android yet.

      Code that demonstrates the issue: https://gist.github.com/anonymous/0fdc1ec212be1e29309820477257a0c3

      In the example, the reading will split the '\u0153' character into '...\x01' and '\x53', which fails to decode in UTF-8.

      A workaround is to use readAsArrayBuffer instead, and do the decoding in JavaScript. However, the decoding can be quite slow on iOS where a native TextDecoder is not available.

      One solution would be to make the chunk sizes semi-flexible, to ensure that it ends on a character boundary (make the chunk larger until decoding succeeds).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rkistner Ralf Kistner
              • Votes:
                2 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: