Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
Security Level: No security risk; visible to anyone (Ordinary problems in Xalan projects. Anybody can view the issue.)
-
Patch
Description
XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a surrogate pair (two UTF-16 units), were not being serialized correctly. We have a proposed fix for that.
There is reported to still be an edge case when a surrogate pair which crosses buffer boundaries might not be handled correctly. maxfortun offered what looks like a reasonable proposed fix (https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607), but in my testing this was not serializing the surrogate pairs correctly, causing regression on the tests XALANJ-2419 introduced. I don't know whether that's because we're taking multiple paths through
But the edge case does appear to be real, and if so we will need some such solution.
Attachments
Attachments
Issue Links
- split to
-
XALANJ-2730 Review handling of isolated UTF16 surrogate characters in serializer
- Open
- links to