Uploaded image for project: 'Commons CSV'
  1. Commons CSV
  2. CSV-67

UnicodeUnescapeReader should not be applied before parsing

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The UnicodeEscapeReader is currently applied before the input file is parsed.

      This means that unicode escapes are treated differently from other escapes.

      For example, the sequence <esc>r<esc>n is not treated as a new-line for the purpose of recognising the end of a record, yet \o000D\u000A is converted to CRLF and would terminate the record (unless embedded in a quoted string).

      The unicode escape processing (if selected) should occur as part of the parsing, just as for ordinary escape processing.

      The class can be made public so the user can wrap the input if required; this preserves the existing functionality should it be required, so there is no need to introduce another setting.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sebb Sebb
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: