Commons Codec
  1. Commons Codec
  2. CODEC-17

[codec] Metaphone B not handling ending MB correctly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Nightly Builds
    • Fix Version/s: 1.3
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

      Description

      Error in case for 'B', if a word ends in "MB" (ie "COMB"), Metaphone should
      not add B to the code.

        Activity

        Hide
        Tim O'Brien added a comment -

        This issue has been addressed, here is an excerpt from one of my emails to
        commons-dev for the record:

        I uncovered a potential bug in Metaphone. The code in question deals
        with
        > the encoding of 'B':
        >
        > // START CODE from Metaphone
        >
        > case 'B' :
        > if ((n > 0) && !(n + 1 == wdsz) &&
        > (local.charAt(n - 1) == 'M'))

        { // not MB at end of word > code.append(symb); > }

        else

        { > code.append(symb); > }
        > mtsz++;
        > break;
        >
        > // END CODE
        >
        > My understanding is that we should not encode a 'B' if a word ends in
        > "MB".
        > (Following:
        http://aspell.sourceforge.net/metaphone/metaphone-kuhn.txt)So
        > the Metaphone of "COMB" is "KM" not "TMB", and the Metaphone of "TOMB"
        is
        > "TM" not "TMB". I "refactored" this code a bit and came up with the
        > following:
        >
        > case 'B' :
        > if ( isPreviousChar(local, n, 'M') &&
        > isLastChar(wdsz, n) ) { > // B is silent if word ends in MB > break; > } else { > code.append(symb); > }

        > break;
        >
        > Also, this code was (outright) copied from a C++ program, there was no
        > need to keep track of the length of our StringBuffer in a variable
        > named "mtsz".
        > That's gone, and the only reason this was possible was great code
        > coverage.

        Show
        Tim O'Brien added a comment - This issue has been addressed, here is an excerpt from one of my emails to commons-dev for the record: I uncovered a potential bug in Metaphone. The code in question deals with > the encoding of 'B': > > // START CODE from Metaphone > > case 'B' : > if ((n > 0) && !(n + 1 == wdsz) && > (local.charAt(n - 1) == 'M')) { // not MB at end of word > code.append(symb); > } else { > code.append(symb); > } > mtsz++; > break; > > // END CODE > > My understanding is that we should not encode a 'B' if a word ends in > "MB". > (Following: http://aspell.sourceforge.net/metaphone/metaphone-kuhn.txt)So > the Metaphone of "COMB" is "KM" not "TMB", and the Metaphone of "TOMB" is > "TM" not "TMB". I "refactored" this code a bit and came up with the > following: > > case 'B' : > if ( isPreviousChar(local, n, 'M') && > isLastChar(wdsz, n) ) { > // B is silent if word ends in MB > break; > } else { > code.append(symb); > } > break; > > Also, this code was (outright) copied from a C++ program, there was no > need to keep track of the length of our StringBuffer in a variable > named "mtsz". > That's gone, and the only reason this was possible was great code > coverage.

          People

          • Assignee:
            Unassigned
            Reporter:
            Tim O'Brien
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development