Issue Details (XML | Word | Printable)

Key: XALANJ-2132
Type: Bug Bug
Status: Resolved Resolved
Resolution: Duplicate
Priority: Major Major
Assignee: Unassigned
Reporter: Adam Lally
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
XalanJ2

Incorrect serialization of supplementary Unicode characters

Created: 01/Jun/05 05:36 AM   Updated: 09/Aug/05 02:46 PM
Return to search
Component/s: None
Affects Version/s: 2.6
Fix Version/s: 2.7

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File XsltSerializeSurrogates.java 2005-06-01 05:37 AM Adam Lally 1 kB
Environment: Windows XP, Sun JRE 1.5.0_03

Resolution Date: 08/Jun/05 07:17 AM


 Description  « Hide
Passing a surrogate pair to a TransformerHandler causes incorrect UTF-8 to be generated. The following code illustrates the problem:


    SAXTransformerFactory transformerFactory = (SAXTransformerFactory)SAXTransformerFactory.newInstance();
    TransformerHandler handler = transformerFactory.newTransformerHandler();
    handler.setResult(new StreamResult(System.out));
    
    char[] chars = new char[2];
    chars[0] = (char)0xD803;
    chars[1] = (char)0xDD75;

    handler.startDocument();
    handler.startElement("","","foo", new AttributesImpl());
    handler.characters(chars, 0, chars.length);
    handler.endElement("","","foo");
    handler.endDocument();


If you take the output of this program and try to parse it with the Xerces SAX Parser, you get an "Invalid byte 2 of 3-byte UTF-8 sequence" exception.







 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
There are no subversion log entries for this issue yet.