Uploaded image for project: 'XalanJ2'
  1. XalanJ2
  2. XALANJ-2593

Incorrect showing of supplementary characters in attributes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.2
    • None
    • Serialization
    • Security Level: No security risk; visible to anyone (Ordinary problems in Xalan projects. Anybody can view the issue.)
    • None
    • Win 7 x64, Java 1.6
    • PatchAvailable
    • fp1

    Description

      In Xalan 2.7.2 the supplementary characters (see http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html for details) shown incorrectly in attributes .
      For example, I need to show symbols 𣎴 (& # 144308 ; ) or 𠘨 (& # 132648 ; ) in attribute "y" of element "x"
      Expected result:

      <?xml version="1.0" encoding="UTF-8"?><x y="&#144308; - &#132648;"/>

      Actual result for Xalan 2.7.2 is:

       <?xml version="1.0" encoding="UTF-8"?><x y="&#55372;&#57268; - &#55361;&#56872;"/>

      Code snippet for test:

      public static void main(String[] argv) throws Exception {
              TransformerFactory tFactory = TransformerFactory.newInstance();
              StreamSource stylesource = new StreamSource(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\"?><xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\" ><xsl:template match=\"/\"><x y=\"{xslt/search/value1}\" /></xsl:template></xsl:stylesheet>"));
              Transformer transformer = tFactory.newTransformer(stylesource);
              StreamSource source = new StreamSource(new StringReader("<?xml version=\"1.0\"?><xslt><search><value1>𣎴 - 𠘨</value1></search></xslt>"));
              Result result = new StreamResult(System.out);
              transformer.transform(source, result);
          } 
      

      The problem relates to the method org.apache.xml.serializer.ToStream.writeAttrString(Writer, String, String).

                  if (m_charInfo.shouldMapAttrChar(ch)) {
                      // The character is supposed to be replaced by a String
                      // e.g.   '&'  -->  "&amp;"
                      // e.g.   '<'  -->  "&lt;"
                      accumDefaultEscape(writer, ch, i, stringChars, len, false, true);
                  }
      

      this part doesn't process multicharacter sequences like supplementary characters within Java platform and this leads to executing next part within same method

                  else {
                          // This is a fallback plan, we should never get here
                          // but if the character wasn't previously handled
                          // (i.e. isn't in the encoding, etc.) then what
                          // should we do?  We choose to write out a character ref
                          writer.write("!13&#");
                          writer.write(Integer.toString(ch));
                          writer.write(';');
                      }
      

      PS: Can't add patch file, so put here.

      --- src\org\apache\xml\serializer\ToStream.java	2014-03-26 17:21:30 +0200
      +++ src\org\apache\xml\serializer\ToStream.java	2014-09-09 19:09:30 +0300
      @@ -2112,8 +2112,13 @@
                       // e.g.   '&'  -->  "&amp;"
                       // e.g.   '<'  -->  "&lt;"
                       accumDefaultEscape(writer, ch, i, stringChars, len, false, true);
      -            }
      -            else {
      +            } else if (Encodings.isHighUTF16Surrogate(ch)) {
      +                // more than single input character can be processed
      +                // within accumDefaultEscape()
      +                // so we set appropriate value for loop for().
      +                i = accumDefaultEscape(writer, ch, i, stringChars, len, false, true); 
      +
      +            } else {
                       if (0x0 <= ch && ch <= 0x1F) {
                           // Range 0x00 through 0x1F inclusive
                           // This covers the non-whitespace control characters
      

      Attachments

        Activity

          People

            shathaway Steven J. Hathaway
            eshkel Eugene Shkel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified