Uploaded image for project: 'XalanJ2'
  1. XalanJ2
  2. XALANJ-2725

Possible buffer-boundry issue when serializing surrogate pairs

    XMLWordPrintableJSON

Details

    • Patch

    Description

      XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a surrogate pair (two UTF-16 units), were not being serialized correctly. We have a proposed fix for that.

      There is reported to still be an edge case when a surrogate pair which crosses buffer boundaries might not be handled correctly. maxfortun offered what looks like a reasonable proposed fix (https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607), but in my testing this was not serializing the surrogate pairs correctly, causing regression on the tests XALANJ-2419 introduced. I don't know whether that's because we're taking multiple paths through

      But the edge case does appear to be real, and if so we will need some such solution.

      Attachments

        Issue Links

          Activity

            People

              keshlam@alum.mit.edu Joe Kesselman
              keshlam@alum.mit.edu Joe Kesselman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 168h
                  168h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified