Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2130

UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 3.2.0
    • 3.2.1
    • DOM
    • None

    Description

      Solution for XERCESC-1854 introduced method
      DOMLSSerializerImpl::ensureValidString
      which has an error in validation.
      The method validates XMLCh which represent UTF16.

      Valid Characters #x9 | #xA | #xD | x20-#xD7FF | xE000-#xFFFD | x10000-#x10FFFF
      are the valid UTF32 characters.

      The UTF16 surrogate range from xD800 - xDFFF is used to represent x10000-#x10FFFF and should not be handled as nvalid.

      The reader threads this correctly and does not complain, which leads to an asmetric behavior

      Reading DOM => OK
      Save back DOM => Exception

      I tried to attach an example to show the behavior.

      The used methods
      bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)
      already have a second optional parameter to check surrogate values.

      Attachments

        1. reproduce.cpp
          2 kB
          Andreas Krantz
        2. patch_.cpp
          1 kB
          Andreas Krantz
        3. fix.patch
          1 kB
          Andreas Krantz

        Activity

          People

            scantor Scott Cantor
            akrantz Andreas Krantz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: