Uploaded image for project: 'Commons Text'
  1. Commons Text
  2. TEXT-209

LookupTranslator returns count of chars consumed, not of codepoints consumed

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.9
    • 1.10.0

    Description

      The contract of abstract method translate in the class CharSequenceTranslator, and therefore also in the inherited LookupTranslator, is to return the "int count of codepoints consumed".

      Cf. their javadoc.

      However, LookupTranslator returns the number of chars.

      This can be seen in its source, in its implementation of the abstract method, where it returns "i", which is the length in chars of the longest matching substring.

      Test to reproduce:

      Define a mapping where a String with 1 supplementary character is mapped to 1 (basic) char.

      /* Key: string with Mathematical double-struck capital A (U+1D538) */
      String symbol = new StringBuilder().appendCodePoint(0x1D538).toString();
      
      /* Map U+1D538 to "A" */
      Map<CharSequence, CharSequence> map = new HashMap<>();
      map.put(symbol, "A");
      
      LookupTranslator translator = new LookupTranslator(map);
      String translated = translator.translate(symbol + "=A");
      		
      /* Fails: instead of "A=A", we get "AA". */
      assertEquals("A=A", translated);
      
      

      So when doing the translation, the supplementary character got mapped, but then you notice that the LookupTranslator erroneously swallowed the following "=" character.

      That is because its translate method returns the count of matched chars (i.e. 2 = the high and low surrogate code unit  (chars) that form the surrogate pair)  , instead of the count of matched codepoints (i.e. which is 1, and which the javadoc claims to return)

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kinow Bruno P. Kinoshita
            belpk K P
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                Slack

                  Issue deployment