Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6519

Add String Distance and Phonetic Functions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.14.0
    • None

    Description

      From a recent project, this collection of functions makes it possible to do fuzzy string matching as well as phonetic matching on strings. 

       

      The following functions are all phonetic functions and map text to a number or string based on how the word sounds.  For instance "Jayme" and "Jaime" have the same soundex values and hence these functions can be used to match similar sounding words.

      • caverphone1( <string> )
      • caverphone2( <string> )
      • cologne_phonetic( <string> )
      • dm_soundex( <string> )
      • double_metaphone(<string>)
      • match_rating_encoder( <string> )
      • metaphone(<string>)
      • nysiis( <string> )
      • refined_soundex(<string>)
      • soundex(<string>)

      Additionally, there is the

      sounds_like(<string1>,<string2>)

      function which can be used to find strings that sound similar.   For instance:

       

      SELECT * 
      FROM <data>
      WHERE sounds_like( last_name, 'Gretsky' )
      

      String Distance Functions

      In addition to the phonetic functions, there are a series of distance functions which measure the difference between two strings.  The functions include:

      • cosine_distance(<string1>,<string2>)
      • fuzzy_score(<string1>,<string2>)
      • hamming_distance (<string1>,<string2>)
      • jaccard_distance (<string1>,<string2>)
      • jaro_distance (<string1>,<string2>)
      • levenshtein_distance (<string1>,<string2>)
      • longest_common_substring_distance(<string1>,<string2>)

       

      Attachments

        Issue Links

          Activity

            People

              cgivre Charles Givre
              cgivre Charles Givre
              Arina Ielchiieva Arina Ielchiieva
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: