Description
Soundex is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes.
The American Soundex System
The soundex code consist of the first letter of the name followed by three digits. These three digits are determined by dropping the letters a, e, i, o, u, h, w and y and adding three digits from the remaining letters of the name according to the table below. There are only two additional rules. (1) If two or more consecutive letters have the same code, they are coded as one letter. (2) If there are an insufficient numbers of letters to make the three digits, the remaining digits are set to zero.
Soundex Table
1 b,f,p,v
2 c,g,j,k,q,s,x,z
3 d, t
4 l
5 m, n
6 r
Examples:
Miller M460
Peterson P362
Peters P362
Auerbach A612
Uhrbach U612
Moskowitz M232
Moskovitz M213
Implementation:
http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/Soundex.html
Attachments
Attachments
Issue Links
- is related to
-
HIVE-4053 Add support for phonetic algorithms in Hive
- Open
- links to