Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1
    • Component/s: Clustering
    • Labels:
      None

      Description

      Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.

      More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering

      I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

      1. MAHOUT-74.patch
        58 kB
        Pallavi Palleti
      2. MAHOUT-74.patch
        60 kB
        Grant Ingersoll
      3. mahout-74.patch
        55 kB
        Pallavi Palleti
      4. mahout-74.patch
        60 kB
        Grant Ingersoll

        Activity

        Hide
        Pallavi Palleti added a comment -

        I have implemented Fuzzy K-Means prototype and tests. Please review the code.

        Show
        Pallavi Palleti added a comment - I have implemented Fuzzy K-Means prototype and tests. Please review the code.
        Hide
        Grant Ingersoll added a comment -

        Here's an update the compiles against trunk.

        Show
        Grant Ingersoll added a comment - Here's an update the compiles against trunk.
        Hide
        Grant Ingersoll added a comment -

        Couple of questions:

        1. What's the urlCount for on SoftCluster?

        2. Shouldn't SoftCluster.m be non-final (and configurable.)

        3. It seems like there should be an opportunity for more inheritance/overlap, etc. w/ the K-Means clustering, but I'd have to think about it a bit more.

        The wikipedia article implies that m == 1 is "similar" to KMeans, is it the case that we could make KMeans just be a special case of fuzzy k means through the appropriate choosing of parameters?

        Otherwise, the tests pass and it looks to be in pretty good shape. Would be cool to have an example added, but not required for this patch to go in.

        Show
        Grant Ingersoll added a comment - Couple of questions: 1. What's the urlCount for on SoftCluster? 2. Shouldn't SoftCluster.m be non-final (and configurable.) 3. It seems like there should be an opportunity for more inheritance/overlap, etc. w/ the K-Means clustering, but I'd have to think about it a bit more. The wikipedia article implies that m == 1 is "similar" to KMeans, is it the case that we could make KMeans just be a special case of fuzzy k means through the appropriate choosing of parameters? Otherwise, the tests pass and it looks to be in pretty good shape. Would be cool to have an example added, but not required for this patch to go in.
        Hide
        Pallavi Palleti added a comment -

        Hi Grant,
        urlCount is unnecessary variable. It got added mistakenly.
        SoftCluster.m should be configurable. I am sorry. I forgot to modify it.

        Show
        Pallavi Palleti added a comment - Hi Grant, urlCount is unnecessary variable. It got added mistakenly. SoftCluster.m should be configurable. I am sorry. I forgot to modify it.
        Hide
        Pallavi Palleti added a comment -

        Modified code to remove urlcount (an unnecessary variable) and made "m" configurable. Also made distance measure class "configurable"

        Show
        Pallavi Palleti added a comment - Modified code to remove urlcount (an unnecessary variable) and made "m" configurable. Also made distance measure class "configurable"
        Hide
        Grant Ingersoll added a comment -

        Looking pretty good, Pallavi. I modified it slightly so that m is set just via the JobConf like the other values. I think we are in pretty good shape and I will commit soon. I also made m a float. Looking at the wiki link you have there, I don't see any reason why m should be restricted to an int.

        Show
        Grant Ingersoll added a comment - Looking pretty good, Pallavi. I modified it slightly so that m is set just via the JobConf like the other values. I think we are in pretty good shape and I will commit soon. I also made m a float. Looking at the wiki link you have there, I don't see any reason why m should be restricted to an int.
        Hide
        Grant Ingersoll added a comment -

        Committed revision 688122.

        Show
        Grant Ingersoll added a comment - Committed revision 688122.

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Pallavi Palleti
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development