Uploaded image for project: 'Hama'
  1. Hama
  2. HAMA-834

Fix KMeans example

    XMLWordPrintableJSON

Details

    Description

      Fix problems in KMeans example and revise test case.

      1) Typo [1] and input path issue

      2) Wrong summationCount in assignCentersInternal
      summationCount should also be incremented if [2]

      if (clusterCenter == null) {
        newCenterArray[lowestDistantCenter] = key;
      }
      

      Otherwise summationCount may stay zero when only one value is assigned. Then this zero will be propagated to incrementSum [3] and might cause a divide by zero in [4].

      By the way if we add three vectors and the summationCount would only be two, this will lead to wrong results. Because later we are dividing the vector by the amount of increments.

      3) Results depend on the amount numBspTask
      (results vary if numBspTask is changed)

      [1]
      https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L518-519
      [2] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L249
      [3]
      https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L161
      [4] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L172

      Attachments

        1. HAMA-834_v02.patch
          7 kB
          Edward J. Yoon
        2. HAMA-834_v03.patch
          10 kB
          Martin Illecker
        3. HAMA-834_v04.patch
          0.9 kB
          Martin Illecker
        4. HAMA-834.patch
          8 kB
          Martin Illecker

        Activity

          People

            bafu Martin Illecker
            bafu Martin Illecker
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: