Mahout
  1. Mahout
  2. MAHOUT-1052

Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.8
    • Component/s: Clustering
    • Labels:

      Description

      Add a parameter to MinHash clustering that specifies the dimension of vector to hash (indexes or values). Current version of MinHash clustering only hashed values of vectors. Based on discussion on dev-mahout list, both of the use-cases are possible and frequently met in practice.
      Preserve backward compatibility with default dimension set to values. Add new unit tests.

      1. MAHOUT-1052.patch
        14 kB
        Elena Smirnova
      2. MAHOUT-1052.patch
        16 kB
        Suneel Marthi

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        8m 55s 1 Elena Smirnova 12/Aug/12 20:08
        Patch Available Patch Available Resolved Resolved
        295d 8h 33m 1 Suneel Marthi 04/Jun/13 04:41
        Resolved Resolved Closed Closed
        244d 4h 24m 1 Suneel Marthi 03/Feb/14 08:05
        Suneel Marthi made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #2036 (See https://builds.apache.org/job/Mahout-Quality/2036/)
        MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values) (Revision 1489281)

        Result = SUCCESS
        smarthi :
        Files :

        • /mahout/trunk/CHANGELOG
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashDriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinhashOptionCreator.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/minhash/TestMinHashClustering.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #2036 (See https://builds.apache.org/job/Mahout-Quality/2036/ ) MAHOUT-1052 : Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values) (Revision 1489281) Result = SUCCESS smarthi : Files : /mahout/trunk/CHANGELOG /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashDriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinhashOptionCreator.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/minhash/TestMinHashClustering.java
        Suneel Marthi made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.8 [ 12320153 ]
        Fix Version/s Backlog [ 12318886 ]
        Resolution Fixed [ 1 ]
        Hide
        Suneel Marthi added a comment -

        Patch committed to trunk

        Show
        Suneel Marthi added a comment - Patch committed to trunk
        Suneel Marthi made changes -
        Attachment MAHOUT-1052.patch [ 12585905 ]
        Hide
        Suneel Marthi added a comment -

        Cleaned up the patch to be compatible with present codebase. Uploading new patch.

        Show
        Suneel Marthi added a comment - Cleaned up the patch to be compatible with present codebase. Uploading new patch.
        Hide
        Suneel Marthi added a comment -

        This patch can be committed to trunk (as part of 0.8 release). Cleaned up the patch to be in sync with present codebase.

        Show
        Suneel Marthi added a comment - This patch can be committed to trunk (as part of 0.8 release). Cleaned up the patch to be in sync with present codebase.
        Hide
        Suneel Marthi added a comment -

        I can get this patch in for the 0.8 release, but the quality of clusters is still questionable. Nevertheless this patch is still needed, I can open another JIRA for Minhash clustering itself (based on Broder's paper). Thoughts?

        Show
        Suneel Marthi added a comment - I can get this patch in for the 0.8 release, but the quality of clusters is still questionable. Nevertheless this patch is still needed, I can open another JIRA for Minhash clustering itself (based on Broder's paper). Thoughts?
        Suneel Marthi made changes -
        Assignee Suneel Marthi [ smarthi ]
        Elena Smirnova made changes -
        Fix Version/s Backlog [ 12318886 ]
        Elena Smirnova made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Elena Smirnova made changes -
        Field Original Value New Value
        Attachment MAHOUT-1052.patch [ 12540564 ]
        Hide
        Elena Smirnova added a comment -

        Attached is the patch.

        Show
        Elena Smirnova added a comment - Attached is the patch.
        Elena Smirnova created issue -

          People

          • Assignee:
            Suneel Marthi
            Reporter:
            Elena Smirnova
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development