Uploaded image for project: 'Commons Text'
  1. Commons Text
  2. TEXT-209

LookupTranslator returns count of chars consumed, not of codepoints consumed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.9
    • 1.10.0

    Description

      The contract of abstract method translate in the class CharSequenceTranslator, and therefore also in the inherited LookupTranslator, is to return the "int count of codepoints consumed".

      Cf. their javadoc.

      However, LookupTranslator returns the number of chars.

      This can be seen in its source, in its implementation of the abstract method, where it returns "i", which is the length in chars of the longest matching substring.

      Test to reproduce:

      Define a mapping where a String with 1 supplementary character is mapped to 1 (basic) char.

      /* Key: string with Mathematical double-struck capital A (U+1D538) */
      String symbol = new StringBuilder().appendCodePoint(0x1D538).toString();
      
      /* Map U+1D538 to "A" */
      Map<CharSequence, CharSequence> map = new HashMap<>();
      map.put(symbol, "A");
      
      LookupTranslator translator = new LookupTranslator(map);
      String translated = translator.translate(symbol + "=A");
      		
      /* Fails: instead of "A=A", we get "AA". */
      assertEquals("A=A", translated);
      
      

      So when doing the translation, the supplementary character got mapped, but then you notice that the LookupTranslator erroneously swallowed the following "=" character.

      That is because its translate method returns the count of matched chars (i.e. 2 = the high and low surrogate code unit  (chars) that form the surrogate pair)  , instead of the count of matched codepoints (i.e. which is 1, and which the javadoc claims to return)

      Attachments

        Activity

          People

            kinow Bruno P. Kinoshita
            belpk K P
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h