Uploaded image for project: 'Commons Text'
  1. Commons Text
  2. TEXT-130

JaroWinklerDistance: Wrong results due to precision of transpositions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4
    • 1.5
    • None

    Description

      The method JaroWinklerDistance#matches returns transpositions / 2 as integer. However, it is not granted for transpositions to be even. E.g. comparing "aaabcd" and "aaacdb" will result in transpositions = 3. Therefore the method must return 1.5, not 1. Otherwise the similarity is 0.9611111111111111 instead of 0.9416666666666667.

      I recommend to return halfTranspositions instead of transpositions and doing the cast and division ((double) mtp[1] / 2) in JaroWinklerDistance#apply.

      Attachments

        Issue Links

          Activity

            People

              chtompki Rob Tompkins
              jmkeil Jan Martin Keil
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: