Details
Description
Bug found in Xerces-C++ Version 3.1.4 (based on code reviews also newer versions are affected)
How to reproduce: Call SAX2Print for the attached UTF8.xml file "SAX2Print UTF8.xml".
One chinese character is missing in the name attribute of the last but one Instance element.
Fix: The fix for this bug is included in the xerces.patch file.
In XMLUTF8Transcoder.cpp a check for this issue was already included but the conclusion
that the bytes read are updated at the end of the loop was wrong.
The bytes read (bytesEaten) calculation is based on the srcPtr which was already updated when the check is made.
Therefore srcPtr needs to be repositioned in case the Surrogate pair does not fit into the toFill buffer.
Contributor related:
Author Name of the code being contributed: Johannes Willnecker
Employer: Siemens AG
I have the right to grant the copyright licenses for the contribution.
My employer has rights to the code that I have written. My employer gave me permission to contribute this code on its behalf.
I am not aware of any third-party license or other restrictions.