Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.7.1, 2.7.2
-
None
-
Security Level: No security risk; visible to anyone (Ordinary problems in Xalan projects. Anybody can view the issue.)
-
None
Description
When trying to serialize XML with char consisting of unicode surogate char "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates XML string with escaped surogate pair separately, which makes XML unparseable. eg.: SAXParseException; Character reference "�" is an invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
kec@phoebe:~/Downloads$ java -version java version "1.8.0_171" Java(TM) SE Runtime Environment (build 1.8.0_171-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) kec@phoebe:~/Downloads$ java -cp /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. JI9053942 Character: 𠀋 EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a> ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>��</a> [Fatal Error] :1:50: Character reference "&#
kec@phoebe:~/Downloads$ java -cp /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.0/xalan-2.7.0.jar:/home/kec/.m2/repository/xalan/serializer/2.7.0/serializer-2.7.0.jar:. JI9053942 Character: 𠀋 EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a> ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a> ACTUAL PARSED CHAR 𠀋
String value = "\uD840\uDC0B"; System.out.println("Character: " + value); System.out.println("EXPECTED: <?xml version=\"1.0\" encoding=\"UTF-8\"?><a>&#" + value.codePointAt(0) + ";</a>"); StringWriter writer = new StringWriter(); final DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); Document dom = documentBuilder.newDocument(); final Element rootEl = dom.createElement("a"); rootEl.setTextContent(value); dom.appendChild(rootEl); Transformer transformer = TransformerFactory.newInstance().newTransformer(); transformer.transform(new DOMSource(dom), new javax.xml.transform.stream.StreamResult(writer)); String xml = writer.toString(); System.out.println(" ACTUAL: " + xml); InputSource inputSource = new InputSource(); inputSource.setCharacterStream(new StringReader(xml)); System.out.println("ACTUAL PARSED CHAR " + documentBuilder.parse(inputSource).getDocumentElement().getTextContent());
Attachments
Attachments
Issue Links
- is caused by
-
XALANJ-2271 XML 1.1 Serialization, char in attribute value not escaped
- Closed
- is related to
-
XALANJ-2419 Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8
- Resolved
- links to