Uploaded image for project: 'Commons Lang'
  1. Commons Lang
  2. LANG-858

StringEscapeUtils.escapeJava() and escapeEcmaScript() do not output the escaped surrogate pairs that are Java parsable

    XMLWordPrintableJSON

Details

    Description

      In case of Java and ECMA Script, the style of unicode escape '\uxxxxxx' cannot be accepted. We need to separate it into high-surrogate and low-surrogate.

      For example, you put the surrogate pair

      '\uDBFF\uDFFD'
      

      output must be

      "\\uDBFF\\uDFFD"
      

      However you get

      "\\u10FFFD"
      

      Test case here:

      @Test
      public void testEscapeSurrogatePairs() throws Exception {
          assertEquals("\\uDBFF\\uDFFD", StringEscapeUtils.escapeJava("\uDBFF\uDFFD"));
          assertEquals("\\uDBFF\\uDFFD", StringEscapeUtils.escapeEcmaScript("\uDBFF\uDFFD"));
      }
      

      I attached the patch which implements simple solution.
      But UnicodeEscaper.java should not be specified for Java, I think. We need to discuss about it.

      This issue does not be appeared in unescape method.

      Attachments

        1. JavaUnicodeEscape.patch
          0.8 kB
          Kazuki Hamasaki

        Activity

          People

            Unassigned Unassigned
            ashphy Kazuki Hamasaki
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: