XalanJ2
  1. XalanJ2
  2. XALANJ-2560

ToXMLStream does not support unicode supplementary characters

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: Serialization
    • Security Level: No security risk; visible to anyone (Ordinary problems in Xalan projects. Anybody can view the issue.)
    • Environment:
      Xalan 2.7.1 serializer.
      Tested on Ubuntu 12.04 with Oracle JDK 1.7.0_05.

      Description

      org.apache.xml.serializer.ToXMLStream (which extends ToStream) does not support serialization of unicode supplementary characters such as U+1D49C. It creates invalid characters entities like "��" instead of "𝒜" (or F0 9D 92 9C with UTF-8). ToXMLStream is used by LSSerializer when Xalan's serializer is on the classpath.

      org.apache.xml.serialize.DOMSerializerImpl (included in Xerces) does not have this problem, but it is deprecated since Xerces 2.9.0, so this is a regression.

      See http://stackoverflow.com/questions/11952289/serializing-supplementary-unicode-characters-into-xml-documents-with-java for more details.

        Issue Links

          Activity

          Hide
          Christopher Taylor added a comment - - edited

          possible dupe of XALANJ-2419, which has a patch attached. Also referenced in http://stackoverflow.com/questions/10511474/surrogate-pair-handling-in-xalan-2-7-1

          Show
          Christopher Taylor added a comment - - edited possible dupe of XALANJ-2419 , which has a patch attached. Also referenced in http://stackoverflow.com/questions/10511474/surrogate-pair-handling-in-xalan-2-7-1

            People

            • Assignee:
              Unassigned
              Reporter:
              Damien Guillaume
            • Votes:
              4 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development