Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-2186

Write U+FFFD as Unicode escape

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Done
    • Jena 4.2.0
    • Jena 4.3.0
    • None
    • None

    Description

      U+FFFD (Unicode replacement character) arises when there is an encoding mismatch between the input bytes and UTF-8 (see the wikipedia article).

      The tokenizer for Turtle/N-Triple etc raises a warning when a literal U+FFFD is encountered to notify users/applications of potential problems.

      The tokenizer does not warn if it is written intentionally in the input stream as \uFFFD (6 characters).

      The write should this unicode escape form so charcater FFFD is written and read in again without warning.

       

      Attachments

        Issue Links

          Activity

            People

              andy Andy Seaborne
              andy Andy Seaborne
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: