Commons Lang
  1. Commons Lang
  2. LANG-646

StringEscapeUtils.unescapeJava doesn't handle octal escapes and Unicode with extra u

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.5
    • Fix Version/s: 3.0
    • Component/s: lang.*
    • Labels:
      None
    • Environment:

      Irrelevant

      Description

      CODE TO REPRODUCE BUG:

      System.out.println("\45");
      // %
      System.out.println(StringEscapeUtils.unescapeJava("\\45"));
      // 45, should be %
      
      System.out.println("\uu0030");
      // 0
      System.out.println(StringEscapeUtils.unescapeJava("\\uu0030"));
      // throws NestableRuntimeException:
      

      This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

      REFERENCES:

      3.10.6 Escape Sequences for Character and String Literals
      http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

      3.3 Unicode Escapes
      http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

      EXTERNAL LINKS:

      http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/

        Activity

        polygenelubricants created issue -
        polygenelubricants made changes -
        Field Original Value New Value
        Description CODE TO REPRODUCE BUG:

        System.out.println("\45".equals(StringEscapeUtils.unescapeJava("\\45")));
        // false

        System.out.println(StringEscapeUtils.unescapeJava("\\uu0030"));
        // throws NestableRuntimeException:

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3
        CODE TO REPRODUCE BUG:

        System.out.println(
        "\45".equals(StringEscapeUtils.unescapeJava("\\45"))
        ); // false

        System.out.println(
        StringEscapeUtils.unescapeJava("\\uu0030")
        ); // throws NestableRuntimeException:

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3
        polygenelubricants made changes -
        Description CODE TO REPRODUCE BUG:

        System.out.println(
        "\45".equals(StringEscapeUtils.unescapeJava("\\45"))
        ); // false

        System.out.println(
        StringEscapeUtils.unescapeJava("\\uu0030")
        ); // throws NestableRuntimeException:

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3
        (Note: there seems to be a bug in the issue tracker where a double backslash is rendered as newline separator instead.)

        CODE TO REPRODUCE BUG:

        System.out.println(
        "\45".equals(StringEscapeUtils.unescapeJava("\\45"))
        ); // false

        System.out.println(
        StringEscapeUtils.unescapeJava("\\uu0030")
        ); // throws NestableRuntimeException:

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

        EXTERNAL LINKS:

        http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/
        polygenelubricants made changes -
        Description (Note: there seems to be a bug in the issue tracker where a double backslash is rendered as newline separator instead.)

        CODE TO REPRODUCE BUG:

        System.out.println(
        "\45".equals(StringEscapeUtils.unescapeJava("\\45"))
        ); // false

        System.out.println(
        StringEscapeUtils.unescapeJava("\\uu0030")
        ); // throws NestableRuntimeException:

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

        EXTERNAL LINKS:

        http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/
        (Note: there seems to be a bug in the issue tracker where a double backslash is rendered as newline separator instead.)

        CODE TO REPRODUCE BUG:

        {noformat}
        System.out.println(
        "\45".equals(StringEscapeUtils.unescapeJava("\\45"))
        ); // false

        System.out.println(
        StringEscapeUtils.unescapeJava("\\uu0030")
        ); // throws NestableRuntimeException:
        {noformat}

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

        EXTERNAL LINKS:

        http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/
        polygenelubricants made changes -
        Description (Note: there seems to be a bug in the issue tracker where a double backslash is rendered as newline separator instead.)

        CODE TO REPRODUCE BUG:

        {noformat}
        System.out.println(
        "\45".equals(StringEscapeUtils.unescapeJava("\\45"))
        ); // false

        System.out.println(
        StringEscapeUtils.unescapeJava("\\uu0030")
        ); // throws NestableRuntimeException:
        {noformat}

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

        EXTERNAL LINKS:

        http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/
        CODE TO REPRODUCE BUG:

        {noformat}
        System.out.println(
            "\45".equals(StringEscapeUtils.unescapeJava("\\45"))
        ); // false

        System.out.println(
            StringEscapeUtils.unescapeJava("\\uu0030")
        ); // throws NestableRuntimeException:
        {noformat}

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

        EXTERNAL LINKS:

        http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/
        polygenelubricants made changes -
        Description CODE TO REPRODUCE BUG:

        {noformat}
        System.out.println(
            "\45".equals(StringEscapeUtils.unescapeJava("\\45"))
        ); // false

        System.out.println(
            StringEscapeUtils.unescapeJava("\\uu0030")
        ); // throws NestableRuntimeException:
        {noformat}

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

        EXTERNAL LINKS:

        http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/
        CODE TO REPRODUCE BUG:

        {noformat}
        System.out.println("\45");
        // %
        System.out.println(StringEscapeUtils.unescapeJava("\\45"));
        // 45, should be %

        StringEscapeUtils.unescapeJava("\\uu0030"));
        // throws NestableRuntimeException:
        {noformat}

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

        EXTERNAL LINKS:

        http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/
        polygenelubricants made changes -
        Description CODE TO REPRODUCE BUG:

        {noformat}
        System.out.println("\45");
        // %
        System.out.println(StringEscapeUtils.unescapeJava("\\45"));
        // 45, should be %

        StringEscapeUtils.unescapeJava("\\uu0030"));
        // throws NestableRuntimeException:
        {noformat}

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

        EXTERNAL LINKS:

        http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/
        CODE TO REPRODUCE BUG:

        {noformat}
        System.out.println("\45");
        // %
        System.out.println(StringEscapeUtils.unescapeJava("\\45"));
        // 45, should be %

        System.out.println("\uu0030");
        // 0
        System.out.println(StringEscapeUtils.unescapeJava("\\uu0030"));
        // throws NestableRuntimeException:
        {noformat}

        This is not compliant with the JLS, which allows both [OctalEscape] and extraneous u for [UnicodeMarker] in Java string literal.

        REFERENCES:

        3.10.6 Escape Sequences for Character and String Literals
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6

        3.3 Unicode Escapes
        http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3

        EXTERNAL LINKS:

        http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java/
        Hide
        Henri Yandell added a comment -

        The 'uu' version isn't a problem in the current 3.0 codebase, that was added as a part of creating a UnicodeUnescaper.

        The octal escape isn't handled though and needs to be added as a new Unescaper/Escaper pair in text translate.

        Show
        Henri Yandell added a comment - The 'uu' version isn't a problem in the current 3.0 codebase, that was added as a part of creating a UnicodeUnescaper. The octal escape isn't handled though and needs to be added as a new Unescaper/Escaper pair in text translate.
        Hide
        Henri Yandell added a comment -

        Escaper is easy to write; unescaper is a bit of a pain. Ideally the generic unescaper would know how to escape '
        510' happily, but Java adds the constraints of a max of 377, leading to that being inferred as '
        51' + '0'. Thus the unescaper needs to have a configurable range, and as it plucks numbers off the text it needs to check if has gone beyond the maximum size.

        Show
        Henri Yandell added a comment - Escaper is easy to write; unescaper is a bit of a pain. Ideally the generic unescaper would know how to escape ' 510' happily, but Java adds the constraints of a max of 377, leading to that being inferred as ' 51' + '0'. Thus the unescaper needs to have a configurable range, and as it plucks numbers off the text it needs to check if has gone beyond the maximum size.
        Hide
        polygenelubricants added a comment -

        By the way Henri, the double backslashes in your comment are rendered as newlines somehow. I also had the same problem with the original bug report before I figured out how to quote codes. This is a bug in the bug tracking system rendered, it seems.

        Let's reproduce this again
        there you go.

        Show
        polygenelubricants added a comment - By the way Henri, the double backslashes in your comment are rendered as newlines somehow. I also had the same problem with the original bug report before I figured out how to quote codes. This is a bug in the bug tracking system rendered, it seems. Let's reproduce this again there you go.
        Henri Yandell made changes -
        Fix Version/s 3.0 [ 12311714 ]
        Hide
        Henri Yandell added a comment -

        svn ci m "Adding an OctalUnescaper to handle Java's support of 1>377 Octal values. LANG-646"
        Sending src/main/java/org/apache/commons/lang3/StringEscapeUtils.java
        Adding src/main/java/org/apache/commons/lang3/text/translate/OctalUnescaper.java
        Adding src/test/java/org/apache/commons/lang3/text/translate/OctalUnescaperTest.java
        Transmitting file data ...
        Committed revision 1059753.

        I didn't see much point in an OctalEscaper, so I didn't bother adding that. When escaping Java, we wouldn't know to escape a particular character to Octal for the aesthetic value.

        It also only supports Java's 1->377 octal range; this is because Integer.parseInt(..., 8) only supports that. I didn't see any point in trying to do better than that given that the use case is primarily for Java at the moment.

        Show
        Henri Yandell added a comment - svn ci m "Adding an OctalUnescaper to handle Java's support of 1 >377 Octal values. LANG-646 " Sending src/main/java/org/apache/commons/lang3/StringEscapeUtils.java Adding src/main/java/org/apache/commons/lang3/text/translate/OctalUnescaper.java Adding src/test/java/org/apache/commons/lang3/text/translate/OctalUnescaperTest.java Transmitting file data ... Committed revision 1059753. I didn't see much point in an OctalEscaper, so I didn't bother adding that. When escaping Java, we wouldn't know to escape a particular character to Octal for the aesthetic value. It also only supports Java's 1->377 octal range; this is because Integer.parseInt(..., 8) only supports that. I didn't see any point in trying to do better than that given that the use case is primarily for Java at the moment.
        Henri Yandell made changes -
        Status Open [ 1 ] Closed [ 6 ]
        Resolution Fixed [ 1 ]
        Mark Thomas made changes -
        Workflow jira [ 12518599 ] Default workflow, editable Closed status [ 12602540 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Closed Closed
        147d 11h 49m 1 Henri Yandell 17/Jan/11 05:35

          People

          • Assignee:
            Unassigned
            Reporter:
            polygenelubricants
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development