Uploaded image for project: 'Commons Lang'
  1. Commons Lang
  2. LANG-607

StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.5
    • Patch Needed
    • lang.*
    • None

    Description

      StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.

      For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00"

      private static final String CharU20000 = "\uD840\uDC00";
      private static final String CharU20001 = "\uD840\uDC01";

      You can see Unicode supplementary characters correctly implemented in the JRE call:

      assertEquals(-1, CharU20000.indexOf(CharU20001));

      But this is broken:

      assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
      assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));

      This is fine:

      assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
      assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
      assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
      assertEquals(false, StringUtils.contains(CharU20000, CharU20001));

      because the method calls the JRE to perform the match.

      More than you want to know:

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ggregory Gary D. Gregory
            ggregory1 Gary Gregory

            Dates

              Created:
              Updated:

              Issue deployment