Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Done
-
Jena 4.2.0
-
None
-
None
Description
U+FFFD (Unicode replacement character) arises when there is an encoding mismatch between the input bytes and UTF-8 (see the wikipedia article).
The tokenizer for Turtle/N-Triple etc raises a warning when a literal U+FFFD is encountered to notify users/applications of potential problems.
The tokenizer does not warn if it is written intentionally in the input stream as \uFFFD (6 characters).
The write should this unicode escape form so charcater FFFD is written and read in again without warning.
Attachments
Issue Links
- is related to
-
JENA-2179 TDB throws Unicode Replacement Character exception while fetching data
- Closed
- links to