Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.7.2
-
None
-
Security Level: No security risk; visible to anyone (Ordinary problems in Xalan projects. Anybody can view the issue.)
-
None
-
Win 7 x64, Java 1.6
-
PatchAvailable
-
fp1
Description
In Xalan 2.7.2 the supplementary characters (see http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html for details) shown incorrectly in attributes .
For example, I need to show symbols 𣎴 (& # 144308 ; ) or 𠘨 (& # 132648 ; ) in attribute "y" of element "x"
Expected result:
<?xml version="1.0" encoding="UTF-8"?><x y="𣎴 - 𠘨"/>
Actual result for Xalan 2.7.2 is:
<?xml version="1.0" encoding="UTF-8"?><x y="�� - ��"/>
Code snippet for test:
public static void main(String[] argv) throws Exception { TransformerFactory tFactory = TransformerFactory.newInstance(); StreamSource stylesource = new StreamSource(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\"?><xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\" ><xsl:template match=\"/\"><x y=\"{xslt/search/value1}\" /></xsl:template></xsl:stylesheet>")); Transformer transformer = tFactory.newTransformer(stylesource); StreamSource source = new StreamSource(new StringReader("<?xml version=\"1.0\"?><xslt><search><value1>𣎴 - 𠘨</value1></search></xslt>")); Result result = new StreamResult(System.out); transformer.transform(source, result); }
The problem relates to the method org.apache.xml.serializer.ToStream.writeAttrString(Writer, String, String).
if (m_charInfo.shouldMapAttrChar(ch)) { // The character is supposed to be replaced by a String // e.g. '&' --> "&" // e.g. '<' --> "<" accumDefaultEscape(writer, ch, i, stringChars, len, false, true); }
this part doesn't process multicharacter sequences like supplementary characters within Java platform and this leads to executing next part within same method
else { // This is a fallback plan, we should never get here // but if the character wasn't previously handled // (i.e. isn't in the encoding, etc.) then what // should we do? We choose to write out a character ref writer.write("!13&#"); writer.write(Integer.toString(ch)); writer.write(';'); }
PS: Can't add patch file, so put here.
--- src\org\apache\xml\serializer\ToStream.java 2014-03-26 17:21:30 +0200 +++ src\org\apache\xml\serializer\ToStream.java 2014-09-09 19:09:30 +0300 @@ -2112,8 +2112,13 @@ // e.g. '&' --> "&" // e.g. '<' --> "<" accumDefaultEscape(writer, ch, i, stringChars, len, false, true); - } - else { + } else if (Encodings.isHighUTF16Surrogate(ch)) { + // more than single input character can be processed + // within accumDefaultEscape() + // so we set appropriate value for loop for(). + i = accumDefaultEscape(writer, ch, i, stringChars, len, false, true); + + } else { if (0x0 <= ch && ch <= 0x1F) { // Range 0x00 through 0x1F inclusive // This covers the non-whitespace control characters