Mahout
  1. Mahout
  2. MAHOUT-933 Implement mapreduce version of ClusterIterator
  3. MAHOUT-991

Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Component/s: Clustering
    • Labels:
      None

      Description

      Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.

      The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

        Activity

        Jeff Eastman created issue -
        Paritosh Ranjan made changes -
        Field Original Value New Value
        Summary Convert Canopy, MeanShift and Other Tools to Use ClusterWritable Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
        Description The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. Adjust the Canopy and MeanShift implementations which do not use this approach to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms. Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.

        The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.
        Paritosh Ranjan made changes -
        Assignee Jeff Eastman [ jeastman ] Paritosh Ranjan [ paritoshranjan ]
        Paritosh Ranjan made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Paritosh Ranjan made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Paritosh Ranjan
            Reporter:
            Jeff Eastman
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development