Uploaded image for project: 'Commons Lang'
  1. Commons Lang
  2. LANG-607

StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.5
    • 3.14.0
    • lang.*
    • None

    Description

      StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.

      For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00"

      private static final String CharU20000 = "\uD840\uDC00";
      private static final String CharU20001 = "\uD840\uDC01";

      You can see Unicode supplementary characters correctly implemented in the JRE call:

      assertEquals(-1, CharU20000.indexOf(CharU20001));

      But this is broken:

      assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
      assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));

      This is fine:

      assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
      assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
      assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
      assertEquals(false, StringUtils.contains(CharU20000, CharU20001));

      because the method calls the JRE to perform the match.

      More than you want to know:

      Attachments

        1. LANG-607.diff
          5 kB
          ggregory@seagullsw.com

        Activity

          People

            ggregory Gary D. Gregory
            ggregory1 Gary Gregory
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: