Uploaded image for project: 'Commons Lang'
  1. Commons Lang
  2. LANG-607

StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.5
    • Fix Version/s: Patch Needed
    • Component/s: lang.*
    • Labels:
      None
    • Environment:

      Description

      StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.

      For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00"

      private static final String CharU20000 = "\uD840\uDC00";
      private static final String CharU20001 = "\uD840\uDC01";

      You can see Unicode supplementary characters correctly implemented in the JRE call:

      assertEquals(-1, CharU20000.indexOf(CharU20001));

      But this is broken:

      assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
      assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));

      This is fine:

      assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
      assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
      assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
      assertEquals(false, StringUtils.contains(CharU20000, CharU20001));

      because the method calls the JRE to perform the match.

      More than you want to know:

        Attachments

        1. LANG-607.diff
          5 kB
          ggregory@seagullsw.com

          Activity

            People

            • Assignee:
              ggregory Gary D. Gregory
              Reporter:
              ggregory1 Gary Gregory
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: