Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9556

create UDF to calculate the Levenshtein distance between two strings

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • UDF
    • None

    Description

      Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

      Example:
      The Levenshtein distance between "kitten" and "sitting" is 3
      1. kitten → sitten (substitution of "s" for "k")
      2. sitten → sittin (substitution of "i" for "e")
      3. sittin → sitting (insertion of "g" at the end).

      select levenshtein('kitten', 'sitting');
      3
      

      Attachments

        1. HIVE-9556.3.patch
          16 kB
          Alexander Pivovarov
        2. HIVE-9556.2.patch
          20 kB
          Alexander Pivovarov
        3. HIVE-9556.1.patch
          20 kB
          Alexander Pivovarov

        Activity

          People

            apivovarov Alexander Pivovarov
            apivovarov Alexander Pivovarov
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: