Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-900

RandomSeedGenerator samples / output k texts incorrectly

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: Clustering
    • Labels:
      None

      Description

                int currentSize = chosenTexts.size();
                if (currentSize < k) {
                  chosenTexts.add(newText);
                  chosenClusters.add(newCluster);
                } else if (random.nextInt(currentSize + 1) == 0) { // with chance 1/(currentSize+1) pick new element
                  int indexToRemove = random.nextInt(currentSize); // evict one chosen randomly
                  chosenTexts.remove(indexToRemove);
                  chosenClusters.remove(indexToRemove);
                  chosenTexts.add(newText);
                  chosenClusters.add(newCluster);
                }
      

      The second "if" condition ought to be "!= 0", right? Only if it is 0 do we skip the body, which removes an existing element, since the new element itself is evicted.

      Second, this code:

              for (int i = 0; i < k; i++) {
                writer.append(chosenTexts.get(i), chosenClusters.get(i));
              }
      

      ... assumes that at least k elements existed in the input, and fails otherwise. Probably need to cap this.

      Patch attached.

        Attachments

          Activity

            People

            • Assignee:
              srowen Sean R. Owen
              Reporter:
              srowen Sean R. Owen

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Issue deployment