Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-1330

KMeans clustering algorithm, doesn't support clustering of sparse input data.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.X
    • Labels:
      None

      Description

      Currently KMeansPlusPlusClusterer class require from generic parameter T` to extend from Clusterable interface, which is:

      public interface Clusterable {
      
          /**
           * Gets the n-dimensional point.
           *
           * @return the point array
           */
          double[] getPoint();
      }
      

      i.e. returns dense representation of the clusterable data, hence making it impossible to efficiently compute kmeans clustering on big dimensional, but very sparse data. I think it will be much better if Clusterable interface will return a Vector allowing usage of SparceVector*s while clustering the data. Of course *KMeansPlusPlusClusterer implementation and I assume other clustering implementations should be refactored accordingly to support this.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                C0rWin Artem Barger
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: