
| Key: |
CODEC-17
|
| Type: |
Bug
|
| Status: |
Closed
|
| Resolution: |
Fixed
|
| Priority: |
Major
|
| Assignee: |
Unassigned
|
| Reporter: |
Tim O'Brien
|
| Votes: |
0
|
| Watchers: |
0
|
|
If you were logged in you would be able to see more operations.
|
|
|
|
Environment:
|
Operating System: other
Platform: Other
Operating System: other
Platform: Other
|
|
|
Error in case for 'B', if a word ends in "MB" (ie "COMB"), Metaphone should
not add B to the code.
|
|
Description
|
Error in case for 'B', if a word ends in "MB" (ie "COMB"), Metaphone should
not add B to the code. |
Show » |
|
commons-dev for the record:
I uncovered a potential bug in Metaphone. The code in question deals
with
> the encoding of 'B':
>
> // START CODE from Metaphone
>
> case 'B' :
> if ((n > 0) && !(n + 1 == wdsz) &&
> (local.charAt(n - 1) == 'M')) { // not MB at end of word > code.append(symb); > } else { > code.append(symb); > }
> mtsz++;
> break;
>
> // END CODE
>
> My understanding is that we should not encode a 'B' if a word ends in
> "MB".
> (Following:
http://aspell.sourceforge.net/metaphone/metaphone-kuhn.txt)So
> the Metaphone of "COMB" is "KM" not "TMB", and the Metaphone of "TOMB"
is
> "TM" not "TMB". I "refactored" this code a bit and came up with the
> following:
>
> case 'B' :
> if ( isPreviousChar(local, n, 'M') &&
> isLastChar(wdsz, n) ) { > // B is silent if word ends in MB > break; > } else {> code.append(symb);> } }
> break;
>
> Also, this code was (outright) copied from a C++ program, there was no
> need to keep track of the length of our StringBuffer in a variable
> named "mtsz".
> That's gone, and the only reason this was possible was great code
> coverage.