Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2130

UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.2.0
    • Fix Version/s: 3.2.1
    • Component/s: DOM
    • Labels:
      None

      Description

      Solution for XERCESC-1854 introduced method
      DOMLSSerializerImpl::ensureValidString
      which has an error in validation.
      The method validates XMLCh which represent UTF16.

      Valid Characters #x9 | #xA | #xD | x20-#xD7FF | xE000-#xFFFD | x10000-#x10FFFF
      are the valid UTF32 characters.

      The UTF16 surrogate range from xD800 - xDFFF is used to represent x10000-#x10FFFF and should not be handled as nvalid.

      The reader threads this correctly and does not complain, which leads to an asmetric behavior

      Reading DOM => OK
      Save back DOM => Exception

      I tried to attach an example to show the behavior.

      The used methods
      bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)
      already have a second optional parameter to check surrogate values.

        Attachments

        1. fix.patch
          1 kB
          Andreas Krantz
        2. patch_.cpp
          1 kB
          Andreas Krantz
        3. reproduce.cpp
          2 kB
          Andreas Krantz

          Activity

            People

            • Assignee:
              scantor Scott Cantor
              Reporter:
              akrantz Andreas Krantz
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: