Mahout
  1. Mahout
  2. MAHOUT-933 Implement mapreduce version of ClusterIterator
  3. MAHOUT-991

Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Component/s: Clustering
    • Labels:
      None

      Description

      Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.

      The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

        Activity

        Hide
        Paritosh Ranjan added a comment -

        Canopy, MeanShift, K-Means, Dirichlet and Fuzzy K Means are emitting ClusterWritable now.

        All the code has been committed.

        Resolving the issue.

        Show
        Paritosh Ranjan added a comment - Canopy, MeanShift, K-Means, Dirichlet and Fuzzy K Means are emitting ClusterWritable now. All the code has been committed. Resolving the issue.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1408 (See https://builds.apache.org/job/Mahout-Quality/1408/)
        Mahout-991 Converted K-Means, Canopy, FuzzyKMeans, Dirichlet and MeanShift to emit ClusterWritable. (Revision 1304490)

        Result = SUCCESS

        Show
        Hudson added a comment - Integrated in Mahout-Quality #1408 (See https://builds.apache.org/job/Mahout-Quality/1408/ ) Mahout-991 Converted K-Means, Canopy, FuzzyKMeans, Dirichlet and MeanShift to emit ClusterWritable. (Revision 1304490) Result = SUCCESS
        Hide
        Paritosh Ranjan added a comment -

        Jeff, thanks for reviewing it.

        Show
        Paritosh Ranjan added a comment - Jeff, thanks for reviewing it.
        Hide
        Jeff Eastman added a comment -

        +1 Paritosh, the changes look like what I was expecting to see.

        Show
        Jeff Eastman added a comment - +1 Paritosh, the changes look like what I was expecting to see.
        Hide
        Saikat Kanjilal added a comment -

        Paritosh,
        Did you already get the fuzzykmeans working, should I not commit anything at this point then?

        Show
        Saikat Kanjilal added a comment - Paritosh, Did you already get the fuzzykmeans working, should I not commit anything at this point then?
        Hide
        Paritosh Ranjan added a comment -

        All junit tests run successfully. I plan to commit this in a day or two. Please suggest if you see any concern.

        Show
        Paritosh Ranjan added a comment - All junit tests run successfully. I plan to commit this in a day or two. Please suggest if you see any concern.
        Hide
        Shannon Quinn added a comment -

        Yes! That's correct. Sorry for the confusion.

        Show
        Shannon Quinn added a comment - Yes! That's correct. Sorry for the confusion.
        Hide
        Paritosh Ranjan added a comment -

        SpectralKMeansDriver is using KMeansDriver only in the end for clustering. So, the output format will be similar to KMeans.

        Its also mentioned there as Javadoc
        "The output format is the same as the K-means output format".

        Is it correct or am I missing something?

        Show
        Paritosh Ranjan added a comment - SpectralKMeansDriver is using KMeansDriver only in the end for clustering. So, the output format will be similar to KMeans. Its also mentioned there as Javadoc "The output format is the same as the K-means output format". Is it correct or am I missing something?
        Hide
        Shannon Quinn added a comment -

        I suspect it would be good to make this same conversion for the spectral clustering package, too? Within the spirit of getting all the clustering algorithms on similar APIs.

        Show
        Shannon Quinn added a comment - I suspect it would be good to make this same conversion for the spectral clustering package, too? Within the spirit of getting all the clustering algorithms on similar APIs.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4450/
        -----------------------------------------------------------

        Review request for mahout.

        Summary
        -------

        Mahout-991 Converted K-Means, Canopy, FuzzyKMeans, Dirichlet and MeanShift to emit ClusterWritable.

        This addresses bug Mahout-991.
        https://issues.apache.org/jira/browse/Mahout-991

        Diffs


        trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyReducer.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java 1302100
        trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletMapper.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletReducer.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansReducer.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansUtil.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansReducer.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansUtil.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyClusterMapper.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyCreatorMapper.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyDriver.java 1303903
        trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyMapper.java 1302085
        trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyReducer.java 1302085
        trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java 1303474
        trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java 1302085
        trunk/core/src/test/java/org/apache/mahout/clustering/fuzzykmeans/TestFuzzyKmeansClustering.java 1302085
        trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java 1302085
        trunk/core/src/test/java/org/apache/mahout/clustering/meanshift/TestMeanShift.java 1303890
        trunk/integration/src/main/java/org/apache/mahout/clustering/evaluation/ClusterEvaluator.java 1302085
        trunk/integration/src/main/java/org/apache/mahout/clustering/evaluation/RepresentativePointsDriver.java 1302085
        trunk/integration/src/main/java/org/apache/mahout/utils/clustering/AbstractClusterWriter.java 1302085
        trunk/integration/src/main/java/org/apache/mahout/utils/clustering/CSVClusterWriter.java 1302085
        trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java 1302085
        trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumperWriter.java 1302085
        trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterWriter.java 1302085
        trunk/integration/src/main/java/org/apache/mahout/utils/clustering/GraphMLClusterWriter.java 1302085
        trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java 1303282

        Diff: https://reviews.apache.org/r/4450/diff

        Testing
        -------

        Thanks,

        Paritosh

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4450/ ----------------------------------------------------------- Review request for mahout. Summary ------- Mahout-991 Converted K-Means, Canopy, FuzzyKMeans, Dirichlet and MeanShift to emit ClusterWritable. This addresses bug Mahout-991. https://issues.apache.org/jira/browse/Mahout-991 Diffs trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyReducer.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java 1302100 trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletMapper.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletReducer.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansReducer.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansUtil.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansReducer.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansUtil.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyClusterMapper.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyCreatorMapper.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyDriver.java 1303903 trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyMapper.java 1302085 trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyReducer.java 1302085 trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java 1303474 trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java 1302085 trunk/core/src/test/java/org/apache/mahout/clustering/fuzzykmeans/TestFuzzyKmeansClustering.java 1302085 trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java 1302085 trunk/core/src/test/java/org/apache/mahout/clustering/meanshift/TestMeanShift.java 1303890 trunk/integration/src/main/java/org/apache/mahout/clustering/evaluation/ClusterEvaluator.java 1302085 trunk/integration/src/main/java/org/apache/mahout/clustering/evaluation/RepresentativePointsDriver.java 1302085 trunk/integration/src/main/java/org/apache/mahout/utils/clustering/AbstractClusterWriter.java 1302085 trunk/integration/src/main/java/org/apache/mahout/utils/clustering/CSVClusterWriter.java 1302085 trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java 1302085 trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumperWriter.java 1302085 trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterWriter.java 1302085 trunk/integration/src/main/java/org/apache/mahout/utils/clustering/GraphMLClusterWriter.java 1302085 trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java 1303282 Diff: https://reviews.apache.org/r/4450/diff Testing ------- Thanks, Paritosh
        Hide
        Paritosh Ranjan added a comment - - edited

        I just figured out that leaving meanshift alone will create problems. So, will convert meanshift as well before committing.

        Show
        Paritosh Ranjan added a comment - - edited I just figured out that leaving meanshift alone will create problems. So, will convert meanshift as well before committing.
        Hide
        Paritosh Ranjan added a comment -

        I am successful in converting all except MeanShift's MR clustering.

        Jeff, can meanshift ( both sequential and MR ) be committed separately/later? Do you see any problems in committing MeanShiftCanopyClustering later?

        Show
        Paritosh Ranjan added a comment - I am successful in converting all except MeanShift's MR clustering. Jeff, can meanshift ( both sequential and MR ) be committed separately/later? Do you see any problems in committing MeanShiftCanopyClustering later?

          People

          • Assignee:
            Paritosh Ranjan
            Reporter:
            Jeff Eastman
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development