Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4038

Outlier Detection Algorithm for MLlib

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • MLlib
    • None

    Description

      The aim of this JIRA is to discuss about which parallel outlier detection algorithms can be included in MLlib.
      The one which I am familiar with is Attribute Value Frequency (AVF). It scales linearly with the number of data points and attributes, and relies on a single data scan. It is not distance based and well suited for categorical data. In original paper a parallel version is also given, which is not complected to implement. I am working on the implementation and soon submit the initial code for review.
      Here is the Link for the paper
      http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4410382

      As pointed out by Xiangrui in discussion
      http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-td8880.html
      There are other algorithms also. Lets discuss about which will be more general and easily paralleled.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Rusty Ashutosh Trivedi
            Votes:
            2 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified