Uploaded image for project: 'Commons Lang'
  1. Commons Lang
  2. LANG-413

Memory usage improvement for StringUtils#getLevenshteinDistance()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.3
    • 2.4
    • lang.*
    • None

    Description

      StringUtils#getLevenshteinDistance() has recently been optimized to conserve a whole lot of memory, which is good. Since the algorithm is symmetric, we can go even further by selecting the shortest of the 2 input Strings to work upon :

      Index: /Documents/workspace/commons-lang/src/java/org/apache/commons/lang/StringUtils.java
      ===================================================================
      — /Documents/workspace/commons-lang/src/java/org/apache/commons/lang/StringUtils.java (revision 629728)
      +++ /Documents/workspace/commons-lang/src/java/org/apache/commons/lang/StringUtils.java (working copy)
      @@ -5737,6 +5737,15 @@
      return n;
      }

      + if (n > m)

      { + // swap the input strings to consume less memory in p[] and d[] + String tmp = s; + s = t; + t = tmp; + n = m; + m = t.length(); + }

      +
      int p[] = new int[n+1]; //'previous' cost array, horizontally
      int d[] = new int[n+1]; // cost array, horizontally
      int _d[]; //placeholder to assist in swapping p and d

      Attachments

        1. StringUtils-Levenshtein.patch
          0.8 kB
          Cédrik LIME

        Activity

          People

            Unassigned Unassigned
            cedrik.lime@gmail.com Cédrik LIME
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: