Details
Description
Follow on from https://issues.apache.org/jira/browse/MADLIB927
which supports one distance function. This JIRA is to
(1)
add additional distance metrics. The model is follow is
http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
fn_dist (optional)
TEXT, default: squared_dist_norm2'. The name of the function to use to calculate the distance between data points.
The following distance functions can be used (computation of barycenter/mean in parentheses):
dist_norm1: 1norm/Manhattan (elementwise median [Note that MADlib does not provide a median aggregate function for support and performance reasons.])
dist_norm2: 2norm/Euclidean (elementwise mean)
squared_dist_norm2: squared Euclidean distance (elementwise mean)
dist_angle: angle (elementwise mean of normalized points)
dist_tanimoto: tanimoto (elementwise mean of normalized points [5])
user defined function with signature DOUBLE PRECISION[] x, DOUBLE PRECISION[] y > DOUBLE PRECISION
and also check of there are other distance functions under
http://madlib.apache.org/docs/latest/group__grp__linalg.html
that might make sense to include while you are at it, in addition to the ones listed above
(2) Add an option for weighted average in the voting.
Issue Links
 mentioned in

Page Loading...
Activity
 All
 Comments
 Work Log
 History
 Activity
 Transitions
After working on
https://github.com/apache/madlib/pull/184
Himanshu Pandey suggested he would like to work on this as well, so assigning to him.
Thank you Himanshu