Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9738

create SOUNDEX udf

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • UDF
    • None

    Description

      Soundex is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes.

      The American Soundex System
      The soundex code consist of the first letter of the name followed by three digits. These three digits are determined by dropping the letters a, e, i, o, u, h, w and y and adding three digits from the remaining letters of the name according to the table below. There are only two additional rules. (1) If two or more consecutive letters have the same code, they are coded as one letter. (2) If there are an insufficient numbers of letters to make the three digits, the remaining digits are set to zero.

      Soundex Table
      1 b,f,p,v
      2 c,g,j,k,q,s,x,z
      3 d, t
      4 l
      5 m, n
      6 r

      Examples:
      Miller M460
      Peterson P362
      Peters P362
      Auerbach A612
      Uhrbach U612
      Moskowitz M232
      Moskovitz M213

      Implementation:
      http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/Soundex.html

      Attachments

        1. HIVE-9738.2.patch
          13 kB
          Alexander Pivovarov
        2. HIVE-9738.1.patch
          13 kB
          Alexander Pivovarov

        Activity

          People

            apivovarov Alexander Pivovarov
            apivovarov Alexander Pivovarov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: