Uploaded image for project: 'Commons Lang'
  1. Commons Lang
  2. LANG-729

StringEscapeUtils.unescapeXml(str) does not support supplemental characters.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 2.6
    • Fix Version/s: 3.0
    • Component/s: lang.*
    • Labels:

      Description

      StringEscapeUtils.unescapeXml(str) does not unescape numeric character references of supplemental characters:

      String str2 = StringEscapeUtils.unescapeXml("𣎴");
      System.out.println(str2.codePointAt(0));
      //38 (it means '&'.)

      This output should be 144308.

      Currently, StringEscapeUtils.unescapeXml(StringEscapeUtils.escapeXml(str)) is equal to str, so it doesn't seem to be wrong. But, as we reported in LANG-728, StringEscapeUtils.escapeXml(str) has a bug. When the bug is fixed, StringEscapeUtils.unescapeXml(StringEscapeUtils.escapeXml(str)) would not be equal to str. We do not expect it. (Of course, we don't expect that StringEscapeUtils.unescapeXml(StringEscapeUtils.escapeXml(str)) is always equal to str.)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              yabuki Taro Yabuki
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: