Uploaded image for project: 'Commons Lang'
  1. Commons Lang
  2. LANG-1199

Fix implementation of StringUtils.getJaroWinklerDistance()

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.4
    • 3.5
    • lang.*
    • None

    Description

      The current implementation of StringUtils.getJaroWinklerDistance() does not compute the correct result in some cases. See #LANG-944 for the initial code contribution.

      StringUtils.getJaroWinklerDistance("Haus Ingeborg", "Ingeborg Esser") == 0.0

      This is due to the incorrect computation of common characters, which causes the algorithm to exit prematurely.

      In contrast, the implementation in Lucene gives ~0.63, which is about right.

      JaroWinklerDistance d = new JaroWinklerDistance();
      getDistance("Haus Ingeborg", "Ingeborg Esser");

      See https://lucene.apache.org/core/3_0_3/api/contrib-spellchecker/org/apache/lucene/search/spell/JaroWinklerDistance.html

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pascalschumacher Pascal Schumacher
            msteiger M. Steiger
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment