Issue Details (XML | Word | Printable)

Key: MATH-266
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Luc Maisonobe
Reporter: Ben McCann
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Commons Math

Support for Clustering Algorithms

Created: 01/May/09 11:08 PM   Updated: 07/Aug/09 09:14 AM
Return to search
Component/s: None
Affects Version/s: 2.0
Fix Version/s: 2.0

Time Tracking:
Not Specified

File Attachments:
  Size
Zip Archive Licensed for inclusion in ASF works clustering.zip 2009-05-02 04:58 PM Ben McCann 5 kB

Resolution Date: 02/May/09 07:40 PM


 Description  « Hide
It'd be nice if Commons Math could run K-means or some other clustering algorithms.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Ben McCann added a comment - 01/May/09 11:15 PM
Here's an implementation of the k-means++ clustering algorithm.
I tested it out by using it for image segmentation and it worked well (reducing # of colors in image by clustering on pixel values - for an example see http://www.leet.it/home/lale/joomla/component/option,com_wrapper/Itemid,50/). It was much much faster than Rafael Santos's library which I ran as a comparison. If all looks good and this is committed, then I'll follow up shortly by submitting some unit tests for it.

Ben McCann added a comment - 01/May/09 11:16 PM
It looks like I accidentally checked the "not intended for inclusion" radio button. This obviously is intended for inclusion and you have my full permission to commit it.

Luc Maisonobe added a comment - 02/May/09 11:21 AM
This is an interesting addition to commons-math.
I think it would be better to put the clustering package under stat. Perhaps later we could add an EM implementation too.
Could you rework your patch to include the ASF license header and complete javadoc including algorithms references, parameters and return values descriptions ?

thanks


Ben McCann added a comment - 02/May/09 04:58 PM
Here are the updated files with the requested changes. Thanks for taking a look!

Luc Maisonobe added a comment - 02/May/09 07:40 PM
fixed in subversion repository as of r770979

I have changed a few things:

  • renamed the Point interface into Clusterable (not sure this is really english ...)
  • use generics
  • added missing javadocs (mainly class-level)
  • added package.html
  • created test
  • replaced an Integer parameter with null meaning unlimited by an int parameter with -1 meaning unlimited
  • fixed a missing loop in the initial centers computation (only two centers were computed instead of k)

Could you please check if everything is fine and either reopen the issue if something is wrong or open a new one if you want to add something ?

thanks for the report and for the patch


Ben McCann added a comment - 04/May/09 07:11 PM
Thanks Luc. It all looks good to me from what I've seen. I appreciate the
quick response.

-Ben


Luc Maisonobe added a comment - 07/Aug/09 09:14 AM
closing resolved issue for 2.0 release