|
Henri Yandell made changes - 16/May/06 09:40 AM
Henri Yandell made changes - 16/May/06 11:16 AM
Henri Yandell made changes - 16/May/06 12:26 PM
Henri Yandell made changes - 14/Jul/06 12:23 PM
Henri Yandell made changes - 27/Oct/07 06:49 AM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
commons-dev for the record:
I uncovered a potential bug in Metaphone. The code in question deals
with
> the encoding of 'B':
>
> // START CODE from Metaphone
>
> case 'B' :
> if ((n > 0) && !(n + 1 == wdsz) &&
> (local.charAt(n - 1) == 'M')) { // not MB at end of word > code.append(symb); > } else { > code.append(symb); > }
> mtsz++;
> break;
>
> // END CODE
>
> My understanding is that we should not encode a 'B' if a word ends in
> "MB".
> (Following:
http://aspell.sourceforge.net/metaphone/metaphone-kuhn.txt)So
> the Metaphone of "COMB" is "KM" not "TMB", and the Metaphone of "TOMB"
is
> "TM" not "TMB". I "refactored" this code a bit and came up with the
> following:
>
> case 'B' :
> if ( isPreviousChar(local, n, 'M') &&
> isLastChar(wdsz, n) ) { > // B is silent if word ends in MB > break; > } else {> code.append(symb);> } }
> break;
>
> Also, this code was (outright) copied from a C++ program, there was no
> need to keep track of the length of our StringBuffer in a variable
> named "mtsz".
> That's gone, and the only reason this was possible was great code
> coverage.