Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-1371

Provide accelerated kmeans++ implementation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • None
    • 4.0
    • None
    • None

    Description

      There is an updated version of kmeans++ algorithm available, which is published in: Elkan, Charles. "Using the triangle inequality to accelerate k-means." ICML. Vol. 3. 2003. paper.

      The main essence is to boost the kmeans iterations by avoiding computation of distances between centers and points when there is no need for that. For example after the update cluster center haven't moved too far from the point therefore no change in point assignment. The accelerated algorithm avoids unnecessary distance calculations by applying the triangle inequality in two different ways, and by keeping track of lower and upper bounds for distances
      between points and centers.

      Algorithm description is available in the paper.

      Attachments

        1. ElkanKmeansPlusPlusClusterer.java
          14 kB
          Artem Barger
        2. ElkanKmeansPlusPlusClustererTest.java
          6 kB
          Artem Barger

        Issue Links

          Activity

            People

              C0rWin Artem Barger
              C0rWin Artem Barger
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m