Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1059

Add additional distance metrics for k-NN

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v1.13
    • k-NN

    Description

      Follow on from https://issues.apache.org/jira/browse/MADLIB-927
      which supports one distance function. This JIRA is to

      (1)
      add additional distance metrics. The model is follow is
      http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html

      fn_dist (optional)
      TEXT, default: squared_dist_norm2'. The name of the function to use to calculate the distance between data points.

      The following distance functions can be used (computation of barycenter/mean in parentheses):

      dist_norm1: 1-norm/Manhattan (element-wise median [Note that MADlib does not provide a median aggregate function for support and performance reasons.])
      dist_norm2: 2-norm/Euclidean (element-wise mean)
      squared_dist_norm2: squared Euclidean distance (element-wise mean)
      dist_angle: angle (element-wise mean of normalized points)
      dist_tanimoto: tanimoto (element-wise mean of normalized points [5])
      user defined function with signature DOUBLE PRECISION[] x, DOUBLE PRECISION[] y -> DOUBLE PRECISION

      and also check of there are other distance functions under
      http://madlib.apache.org/docs/latest/group__grp__linalg.html
      that might make sense to include while you are at it, in addition to the ones listed above

      (2) Add an option for weighted average in the voting.

      Attachments

        Issue Links

          Activity

            People

              riyer Rahul Iyer
              fmcquillan Frank McQuillan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: