Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43493

Add a max distance argument to the levenshtein() function

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.5.0
    • SQL
    • None

    Description

      Currently, Spark's levenshtein(str1, str2) function can be very inefficient for long strings. Many other databases which support this type of built-in function also take a third argument which signifies a maximum distance after which it is okay to terminate the algorithm.

      For example something like

      levenshtein(str1, str2[, max_distance])
      

      the function stops computing the distant once the max values is reached.
      See postgresql for an example of a 3 argument levenshtein.

      Attachments

        Activity

          People

            panbingkun Pan Bingkun
            maxgekk Max Gekk
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: