Issue Details (XML | Word | Printable)

Key: CODEC-17
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Tim O'Brien
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Commons Codec

[codec] Metaphone B not handling ending MB correctly

Created: 19/Apr/04 03:48 AM   Updated: 27/Oct/07 06:49 AM
Return to search
Component/s: None
Affects Version/s: Nightly Builds
Fix Version/s: 1.3

Time Tracking:
Not Specified

Environment:
Operating System: other
Platform: Other

Bugzilla Id: 28457


 Description  « Hide
Error in case for 'B', if a word ends in "MB" (ie "COMB"), Metaphone should
not add B to the code.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Tim O'Brien added a comment - 19/Apr/04 09:39 AM
This issue has been addressed, here is an excerpt from one of my emails to
commons-dev for the record:

I uncovered a potential bug in Metaphone. The code in question deals
with
> the encoding of 'B':
>
> // START CODE from Metaphone
>
> case 'B' :
> if ((n > 0) && !(n + 1 == wdsz) &&
> (local.charAt(n - 1) == 'M')) { // not MB at end of word > code.append(symb); > } else { > code.append(symb); > }
> mtsz++;
> break;
>
> // END CODE
>
> My understanding is that we should not encode a 'B' if a word ends in
> "MB".
> (Following:
http://aspell.sourceforge.net/metaphone/metaphone-kuhn.txt)So
> the Metaphone of "COMB" is "KM" not "TMB", and the Metaphone of "TOMB"
is
> "TM" not "TMB". I "refactored" this code a bit and came up with the
> following:
>
> case 'B' :
> if ( isPreviousChar(local, n, 'M') &&
> isLastChar(wdsz, n) ) { > // B is silent if word ends in MB > break; > } else {> code.append(symb);> } }
> break;
>
> Also, this code was (outright) copied from a C++ program, there was no
> need to keep track of the length of our StringBuffer in a variable
> named "mtsz".
> That's gone, and the only reason this was possible was great code
> coverage.


Henri Yandell made changes - 16/May/06 09:40 AM
Field Original Value New Value
issue.field.bugzillaimportkey 28457 12341408
Henri Yandell made changes - 16/May/06 11:16 AM
Project Commons [ 12310458 ] Commons Codec [ 12310464 ]
Assignee Jakarta Commons Developers Mailing List [ commons-dev@jakarta.apache.org ]
Affects Version/s Nightly Builds [ 12311648 ]
Key COM-1256 CODEC-17
Component/s Codec [ 12311105 ]
Henri Yandell made changes - 16/May/06 12:26 PM
Affects Version/s Nightly Builds [ 12311728 ]
Henri Yandell made changes - 14/Jul/06 12:23 PM
Fix Version/s 1.3 [ 12311737 ]
Bugzilla Id 28457
Henri Yandell made changes - 27/Oct/07 06:49 AM
Status Resolved [ 5 ] Closed [ 6 ]