Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.6
-
None
-
None
-
Windows XP, Sun JRE 1.5.0_03
Description
Passing a surrogate pair to a TransformerHandler causes incorrect UTF-8 to be generated. The following code illustrates the problem:
SAXTransformerFactory transformerFactory = (SAXTransformerFactory)SAXTransformerFactory.newInstance();
TransformerHandler handler = transformerFactory.newTransformerHandler();
handler.setResult(new StreamResult(System.out));
char[] chars = new char[2];
chars[0] = (char)0xD803;
chars[1] = (char)0xDD75;
handler.startDocument();
handler.startElement("","","foo", new AttributesImpl());
handler.characters(chars, 0, chars.length);
handler.endElement("","","foo");
handler.endDocument();
If you take the output of this program and try to parse it with the Xerces SAX Parser, you get an "Invalid byte 2 of 3-byte UTF-8 sequence" exception.