Uploaded image for project: 'XalanJ2'
  1. XalanJ2
  2. XALANJ-2132

Incorrect serialization of supplementary Unicode characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.6
    • 2.7
    • None
    • None
    • Windows XP, Sun JRE 1.5.0_03

    Description

      Passing a surrogate pair to a TransformerHandler causes incorrect UTF-8 to be generated. The following code illustrates the problem:

      SAXTransformerFactory transformerFactory = (SAXTransformerFactory)SAXTransformerFactory.newInstance();
      TransformerHandler handler = transformerFactory.newTransformerHandler();
      handler.setResult(new StreamResult(System.out));

      char[] chars = new char[2];
      chars[0] = (char)0xD803;
      chars[1] = (char)0xDD75;

      handler.startDocument();
      handler.startElement("","","foo", new AttributesImpl());
      handler.characters(chars, 0, chars.length);
      handler.endElement("","","foo");
      handler.endDocument();

      If you take the output of this program and try to parse it with the Xerces SAX Parser, you get an "Invalid byte 2 of 3-byte UTF-8 sequence" exception.

      Attachments

        1. XsltSerializeSurrogates.java
          1 kB
          Adam P. Lally

        Activity

          People

            Unassigned Unassigned
            alally Adam P. Lally
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: