Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9556

create UDF to calculate the Levenshtein distance between two strings

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: UDF
    • Labels:
      None

      Description

      Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

      Example:
      The Levenshtein distance between "kitten" and "sitting" is 3
      1. kitten → sitten (substitution of "s" for "k")
      2. sitten → sittin (substitution of "i" for "e")
      3. sittin → sitting (insertion of "g" at the end).

      select levenshtein('kitten', 'sitting');
      3
      

        Attachments

        1. HIVE-9556.1.patch
          20 kB
          Alexander Pivovarov
        2. HIVE-9556.2.patch
          20 kB
          Alexander Pivovarov
        3. HIVE-9556.3.patch
          16 kB
          Alexander Pivovarov

          Activity

            People

            • Assignee:
              apivovarov Alexander Pivovarov
              Reporter:
              apivovarov Alexander Pivovarov
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: