Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.9
Description
The contract of abstract method translate in the class CharSequenceTranslator, and therefore also in the inherited LookupTranslator, is to return the "int count of codepoints consumed".
Cf. their javadoc.
However, LookupTranslator returns the number of chars.
This can be seen in its source, in its implementation of the abstract method, where it returns "i", which is the length in chars of the longest matching substring.
Test to reproduce:
Define a mapping where a String with 1 supplementary character is mapped to 1 (basic) char.
/* Key: string with Mathematical double-struck capital A (U+1D538) */ String symbol = new StringBuilder().appendCodePoint(0x1D538).toString(); /* Map U+1D538 to "A" */ Map<CharSequence, CharSequence> map = new HashMap<>(); map.put(symbol, "A"); LookupTranslator translator = new LookupTranslator(map); String translated = translator.translate(symbol + "=A"); /* Fails: instead of "A=A", we get "AA". */ assertEquals("A=A", translated);
So when doing the translation, the supplementary character got mapped, but then you notice that the LookupTranslator erroneously swallowed the following "=" character.
That is because its translate method returns the count of matched chars (i.e. 2 = the high and low surrogate code unit (chars) that form the surrogate pair) , instead of the count of matched codepoints (i.e. which is 1, and which the javadoc claims to return)