Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1370

Knn - add zero check and output distance array

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • v1.17
    • k-NN
    • None

    Description

      In unsupervised mode of knn
      http://madlib.apache.org/docs/latest/group__grp__knn.html
      when `point_source` and `test_source` are the same data set, nearest neighbors is not reliably returning the 0 distance point as a nearest neighbor.

      Could there a small neg issue here for a distance that is effectively 0 but shows up as neg epsilon?

      Also, please assess if we can add a vector of distances to the output file:

      Output Format
      The output of the KNN module is a table with the following columns:
      
      id	INTEGER. The ids of test data points.
      test_column_name	DOUBLE PRECISION[]. The test data points.
      prediction	INTEGER. Label in case of classification, average value in case of regression.
      k_nearest_neighbours	INTEGER[]. List of nearest neighbors, sorted closest to furthest from the corresponding test point.
      distance DOUBLE PRECISION[].  Distance sorted in the same order as the 'k_nearest_neighbours' array.
      

      Attachments

        Activity

          People

            okislal Orhan Kislal
            fmcquillan Frank McQuillan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: