Mahout
  1. Mahout
  2. MAHOUT-1052

Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.8
    • Component/s: Clustering
    • Labels:

      Description

      Add a parameter to MinHash clustering that specifies the dimension of vector to hash (indexes or values). Current version of MinHash clustering only hashed values of vectors. Based on discussion on dev-mahout list, both of the use-cases are possible and frequently met in practice.
      Preserve backward compatibility with default dimension set to values. Add new unit tests.

      1. MAHOUT-1052.patch
        14 kB
        Elena Smirnova
      2. MAHOUT-1052.patch
        16 kB
        Suneel Marthi

        Activity

        Hide
        Elena Smirnova added a comment -

        Attached is the patch.

        Show
        Elena Smirnova added a comment - Attached is the patch.
        Hide
        Suneel Marthi added a comment -

        I can get this patch in for the 0.8 release, but the quality of clusters is still questionable. Nevertheless this patch is still needed, I can open another JIRA for Minhash clustering itself (based on Broder's paper). Thoughts?

        Show
        Suneel Marthi added a comment - I can get this patch in for the 0.8 release, but the quality of clusters is still questionable. Nevertheless this patch is still needed, I can open another JIRA for Minhash clustering itself (based on Broder's paper). Thoughts?
        Hide
        Suneel Marthi added a comment -

        This patch can be committed to trunk (as part of 0.8 release). Cleaned up the patch to be in sync with present codebase.

        Show
        Suneel Marthi added a comment - This patch can be committed to trunk (as part of 0.8 release). Cleaned up the patch to be in sync with present codebase.
        Hide
        Suneel Marthi added a comment -

        Cleaned up the patch to be compatible with present codebase. Uploading new patch.

        Show
        Suneel Marthi added a comment - Cleaned up the patch to be compatible with present codebase. Uploading new patch.
        Hide
        Suneel Marthi added a comment -

        Patch committed to trunk

        Show
        Suneel Marthi added a comment - Patch committed to trunk
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #2036 (See https://builds.apache.org/job/Mahout-Quality/2036/)
        MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values) (Revision 1489281)

        Result = SUCCESS
        smarthi :
        Files :

        • /mahout/trunk/CHANGELOG
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashDriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinhashOptionCreator.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/minhash/TestMinHashClustering.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #2036 (See https://builds.apache.org/job/Mahout-Quality/2036/ ) MAHOUT-1052 : Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values) (Revision 1489281) Result = SUCCESS smarthi : Files : /mahout/trunk/CHANGELOG /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashDriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinhashOptionCreator.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/minhash/TestMinHashClustering.java

          People

          • Assignee:
            Suneel Marthi
            Reporter:
            Elena Smirnova
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development