Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2120

DOM Serialization does not correctly validate Surrogate Pairs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4
    • None
    • DOM
    • None

    Description

      When attempting to write an xml document containing valid UTF-16 surrogate pairs an error occurs during validation. This causes the write to fail.

      It appears as though this issue was introduced with https://issues.apache.org/jira/browse/XERCESC-1854 in the following commit http://svn.apache.org/viewvc/xerces/c/trunk/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp?r1=768978&r2=1226891.

      I have supplied a reproducible and a potential patch. The string validator should be responsible for determining if the codepoint is part of a surrogate pair. However, I may also like to make the argument that this may not be the right location to be doing the string validation. As it will leave the output document in an inconsistent (half-written) state.

      Attachments

        1. DomStringValidation.patch
          1 kB
          Andrew Blackton
        2. DOMCharacterValidationTest.cpp
          3 kB
          Andrew Blackton

        Activity

          People

            Unassigned Unassigned
            ablackton Andrew Blackton
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: