Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
KMeans currently on the map-side calculates the distance between a set of seeds and all other vectors. It would be handy to have a generalization of this that, given a set of vectors that fits in memory (the seeds) and other points, emit <seed id, other id, distance> according to the distance measure. This is similar to the RowSimilarityJob, but much simpler and not as general purpose.