Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-1374

KMeansPlusPlusClusterer unable to converge having repeatable points in input dataset

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 4.X
    • None
    • None

    Description

      If the input list size of Clusterable is greater than parameter k while has less unique points than k, the algorithm will fail to converge, tested w/ different EmptyClusterStrategy options, here is the example of default one:

         @Test
          public void testNumberOfRequestedClustersSameAsInputSize() {
      
              final RandomVectorGenerator rng = new UncorrelatedRandomVectorGenerator(10,
                      new GaussianRandomGenerator(RandomSource.create(RandomSource.MT)));
      
              List<DoublePoint> points = new ArrayList<>();
      
              for (int i = 0; i < 10; i++) {
                  final DoublePoint point = new DoublePoint(rng.nextVector());
                  for (int j = 0; j < 3; j++) {
                      points.add(point);
                  }
              }
      
              final KMeansPlusPlusClusterer<DoublePoint> clusterer = new KMeansPlusPlusClusterer<>(12);
              clusterer.cluster(points);
          }
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            C0rWin Artem Barger
            C0rWin Artem Barger

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m

                Slack

                  Issue deployment