Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2335

k-Nearest Neighbor classification and regression for MLLib

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • MLlib

    Description

      The k-Nearest Neighbor model for classification and regression problems is a simple and intuitive approach, offering a straightforward path to creating non-linear decision/estimation contours. It's downsides – high variance (sensitivity to the known training data set) and computational intensity for estimating new point labels – both play to Spark's big data strengths: lots of data mitigates data concerns; lots of workers mitigate computational latency.

      We should include kNN models as options in MLLib.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bgawalt Brian Gawalt
              Ashutosh Trivedi Ashutosh Trivedi
              Votes:
              5 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: