Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-181

DistanceMeasure is broken: iteration is done over nonZeroElements of v1.plus(v2), not v1.minus(v2)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.2
    • 0.2
    • classic
    • None
    • all

    Description

      SquaredEuclideanDistanceMeasure iterates over v1.plus(v2), which has the right number of nonzero elements if v1.get != -v2.get for all i indexing nonzero elements, but for example, the simple case of looking at SquaredEuclideanDisanceMeasure.distance(v, v.assign(new NegateFunction())) yeilds zero on current trunk, instead of 4*v.lengthSquared().

      Attached is a patch with a unit test which checks that DistanceMeasure.distance always returns nonnegative results and in particular also does not return , as well as a fix for ManhattanDistanceMeasure, SquaredEuclideanDistanceMeasure, and EuclideanDistanceMeasure.

      Unfortunately, the attached unit test reveals that the TanimotoDistanceMeasure is more broken than I can fix at present. It doesn't appear to be properly using the referenced formula in wikipedia, and in fact sometimes returns negative results. This means that with this patch applied, TestTanimotoDistanceMeasure is failing (and rightfully so).

      Attachments

        1. MAHOUT-181.patch
          5 kB
          Jake Mannix
        2. MAHOUT-181-with-TanimotoFix.patch
          8 kB
          Jake Mannix

        Activity

          People

            gsingers Grant Ingersoll
            jake.mannix Jake Mannix
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: