Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Component/s: Clustering
    • Labels:

      Description

      Use ClusterClassificationDriver to refactor clustering out of DirichletDriver with outlier pruning support.

      1. MAHOUT-981.txt
        37 kB
        Paritosh Ranjan

        Activity

        Paritosh Ranjan created issue -
        Paritosh Ranjan made changes -
        Field Original Value New Value
        Status Open [ 1 ] In Progress [ 3 ]
        Hide
        Paritosh Ranjan added a comment -

        Refactored K-Means and Dirichlet to use ClusterClassificationDriver.

        I plan to commit this in a day or two. Please suggest if you see any concern.

        Show
        Paritosh Ranjan added a comment - Refactored K-Means and Dirichlet to use ClusterClassificationDriver. I plan to commit this in a day or two. Please suggest if you see any concern.
        Paritosh Ranjan made changes -
        Attachment MAHOUT-981.txt [ 12518322 ]
        Paritosh Ranjan made changes -
        Status In Progress [ 3 ] Patch Available [ 10002 ]
        Hide
        Paritosh Ranjan added a comment -

        The patch is also uploaded on the review board.

        Show
        Paritosh Ranjan added a comment - The patch is also uploaded on the review board.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1397 (See https://builds.apache.org/job/Mahout-Quality/1397/)
        MAHOUT-981, MAHOUT-983. Refactored K-Means Clustering and Dirichlet Clustering to use ClusterClassificationDriver.
        Using cluster.getModel().configure() in ClusterClassificationDriver in order to configure DirichletCluster for MahalanobisDistanceMeasure.
        Added/fixed test cases by:
        Using separate directories in test cases for supplying initial clusters and to store buildClusters to prevent two cluster-*-final files in the same directory.
        Writing IntWritable in test cases instead of LongWritable ( As the ClusterClassificationDriver clusters records with IntWritable keys). (Revision 1301654)

        Result = FAILURE
        pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301654
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/iterator/DirichletClusteringPolicy.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1397 (See https://builds.apache.org/job/Mahout-Quality/1397/ ) MAHOUT-981 , MAHOUT-983 . Refactored K-Means Clustering and Dirichlet Clustering to use ClusterClassificationDriver. Using cluster.getModel().configure() in ClusterClassificationDriver in order to configure DirichletCluster for MahalanobisDistanceMeasure. Added/fixed test cases by: Using separate directories in test cases for supplying initial clusters and to store buildClusters to prevent two cluster-*-final files in the same directory. Writing IntWritable in test cases instead of LongWritable ( As the ClusterClassificationDriver clusters records with IntWritable keys). (Revision 1301654) Result = FAILURE pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301654 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/iterator/DirichletClusteringPolicy.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1398 (See https://builds.apache.org/job/Mahout-Quality/1398/)
        MAHOUT-981, MAHOUT-983. Fixing test cases which fail intermittently.
        Build is passing on my machine ( even for the last commit ).
        Tried to identify all test cases, which can fail intermittently and fixed them. (Revision 1301761)

        Result = SUCCESS
        pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301761
        Files :

        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1398 (See https://builds.apache.org/job/Mahout-Quality/1398/ ) MAHOUT-981 , MAHOUT-983 . Fixing test cases which fail intermittently. Build is passing on my machine ( even for the last commit ). Tried to identify all test cases, which can fail intermittently and fixed them. (Revision 1301761) Result = SUCCESS pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301761 Files : /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
        Hide
        Paritosh Ranjan added a comment -

        Refactored clustering out of DirichletDriver using ClusterClassificationDriver. Dirichlet was already having a threshold option. So, the issue has been developed completely now.
        Resolving the issue.

        Show
        Paritosh Ranjan added a comment - Refactored clustering out of DirichletDriver using ClusterClassificationDriver. Dirichlet was already having a threshold option. So, the issue has been developed completely now. Resolving the issue.
        Paritosh Ranjan made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open In Progress In Progress
        20d 6h 11m 1 Paritosh Ranjan 14/Mar/12 14:04
        In Progress In Progress Patch Available Patch Available
        1m 12s 1 Paritosh Ranjan 14/Mar/12 14:05
        Patch Available Patch Available Resolved Resolved
        2d 14h 20m 1 Paritosh Ranjan 17/Mar/12 04:25
        Resolved Resolved Closed Closed
        91d 5h 9m 1 Sean Owen 16/Jun/12 10:35

          People

          • Assignee:
            Paritosh Ranjan
            Reporter:
            Paritosh Ranjan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development